DOCaHENT BESOHE 



ED 082 763 



LI 004 484 



AUTHOR 
TITLE 

SPONS AGENCY 

PUB DATE 
NOTE 



EDRS PBICE 
DESCBIPTORS 



id£n::'ifiers 



Jevell, Sharon; Brandhorst, W. T. 

Search Strate^;y Tutorial; Searcher's Kit. 

National Inst, of Education (DHEIr) , Washington, 

D.C. 

10 Oct 73 

86p.; (3 references); ERIC Data Base Users 
Conference, Colunbus, Ohio, October 10-12, 1973 

RF-$0.65 HC-$3.29 

Computers; Data Bases; Data Processing; Educational 
Research; *ln for vat ion Retrieval; Inf oraation 
Seeking; Relevance (Infer nation Retrieval) ; * Search 
Strategies; Tutorial Programs 

^Educational Resources Information Center; ERIC 



ABSTFACT 

The educational Resources Information Cente 
system's computerize^ data base vas the focus of a three-h^ 
tutorial session on search strategies. This document is the 
manual used by the tutorial participants. The discussion of 
phase of a computer search covers identification of the use 
population, receiving the inquiry, and the types of service 
The actual mechanics of searching includes general principl 
searching, search theory and general manipulative capabilit 
specific properties of the ERIC system that affect computer 
capabilities. There is a practice session in vhich three se 
structured step-by-step. The output phase of a computer sea 
includes a discussion of output formats, output evaluation, 
statistical records-keeping. Eighteen technical notes discu 
aspects within each of these phases. Notes on the vocabular 
improvement program for the '^Thesaurus of ERIC Descriptors" 
appended. (SJ) 



r (ERIC) 
ur 

workshop 
the input 

r 

s offered, 
es of good 
ies, and 

sea rch 
archGii are 
rch 

and 
ss various 

y 

are 



ERIC 



r 

FILMED FROM BEST AVAILABLE COPY 



US DEPARTMENT OF HEALTH 
EOUCA . lON A WELFARE 
NATIONAL iNSTuUTf OF 
EDUCATION 

tm". fK-)Cij'.TNT MAS RE En -wo 
O F 1 fxATfiy ..s WfcfEtvFi - Wo*. 

Ht >f I^SON Ofc- Of^'.ANi 'A r(ON <»«?(. , 
A * IT p.)(f , 0^ A 0« on S' jN 
• » D DO NC - Nf ' I .ANf i . h : , . , 
•. n^ i: ir 1 Ai r idna t » ■ f o 

oo SEARCH STRATEGY TUTORIAL 

o 



SEARCHER'S KIT 



Prepared by 



Sharon Jewell 
Assistant Director 
ERIC Clearinghouse on Library and Information Sciences 



and 



W.T. Brandhorst 
Director 

ERIC Processing and Reference Facility 



for the 



^ ' BRIC Data Base Users Conference 

Columbus^ Ohio 
^ October 10-12, 1973 

i 



ERIC 



SEARCH STRATEGY TUTORIAL 



OUTLINE 



I . INTRODUCTION 
I I . INPUT PHASE 

A. User Population - Knowing your user group; User needs, expertise, 
background; Purpose of information; Application of information 
conveyed (teaching, research, student, parent, administrator) ; 

User education, relations; Mandate of information center/collection. 

Techni cal Notes : None 

B. Receiving the Inquiry/Question - Personal visit, telephone, letter/ 
telegram; Search question negotiation (asking questions, completing 
forms, etc.); Determine area of interest by going from general to 
specific; Other parameters: volume of output desired, years covered, 
types ^f publications, recall vs. relevance, known "hits", U:ic 
subject specialists. 

Technical Notes: #1. Search Negotiation. Understanding the Request 

C. Types of Service Offered - Retrospective vs. Current Awareness; 

Manual search vs. Computer search; Referrals; Pre-prepared bibliographies; 
Telephone responses; Form responses; All-purpose packets of information. 

Techni cal Notes : #2. Advantages of Computer Searching Over 

Manual Search i ng 

#3. Two Modes of Searching: Retrospective Searching/ 
Current Awarenes s 

III. MECHANICS OF SEARCHING 

A. General Principles of Good Searching - Know your: data base, 
reference tools, search system (software); Make use of: user inputs, 
feedback, previous work, statistics. 

Techni cal Notes : None 

B. Search Theory and General Manipulative Capabilities - Basic logical 
operators; Boolean logic; Symbology; Venn diagrams; Truth Tables 

Search Symbology (Operators, Venn Diagrams, etc.) 

Use of Parentheses (To Avoid Ambiguity) 

Weights. Sorting Output By Weight 

Arithmetic Operators 

Text/String Searching 

NOT Logic 



-2- 



Technical Notes: ^k. 

#5. 
#6. 
#7. 
#8. 
#9. 

ERIC 



C. Properties of the ERIC System - Data elements available for 

searchfng; Indexing vocabularies; Indexing practices; Descriptor/ 
Identifier frequency statistics; Reference tools. 

Technical Notes: ff]0. Data Elements Available for Searching 

#1 1 . i ,evels of Generality and Specificity 

#12, Major-Minor* Index Terms 

^]3. Identifiers 

# 1 ^ . Importance of Knowing Descriptor Frequenc y 
(Posting) Statistics 

#15. Common Descriptor Selection Problems (Exanyie s) 

IV. PRACTICE SESblOf: IN STRUCTURING SEARCHES 

Search #1 Text Editing 

Search - Social Studies Instruction 

Search #3 " Criterion Referenced Tests 
V. OUTPUT PHASE 

A. Output Formats - Printout format Icontinuous or unitized); Data 
elements displayed (options); Cal louts; Introductory explanatory 
matter; Administrative data (requester, date, search title; 
number of hits, search equation); Sequence (newest or oldest first; 
other sorts); CRT display. 

Technical Notes: None 

B. Output Evaluation - No **hits**; Relevance; Recall; Output volu;nes; 
Evaluation form; Other feedback techniques. 

Technical Notes: ^ig. No Hits - What to Do 

^17. Recall and Relevance 

^^18. Output Volumes. What is Too Little? What is Too 
Much? Hit Limits. Reverse Chronological Sort . 

C. Statistics and Miscellaneous - Analysis of requests; Geographic 
distribution; Types of requester; Topical areas; Increasing throughput; 
Use ^o^h^\or searches. 

Technical Notes: None 



ERIC 



3- 



APPENDIX A: Vocabulary Improvement Program 



ERIC 



II. INPUT PHASE 



ERIC 



PAPER 



SEARCH NEGOTIATION. UNDIiR^.I AND I NG THE REQUEST 



This topic may seem so obvious that nothing useful can be said about it. 
Nevertheless, it continues to be an underestimated factor in the conduct of 
a successful search service. 

Too f requent 1 y the searcher is impat i ent Lo take t\\e initial inquiry 
data available and ''run v/ith it". A superior search would have resulted 
if the searcher had first asked the user a series of basic questions. It 
is important to get from the user all the parameters he has to offer, 
e.g., alternative wayt' of describing the topic, closely related topics, 
does the topic have a geographic or institutional attachment, what years 
of publication are desir?d, what volume of output is desired, which is 
more • important - recall or relevance, is he aware of any good documents 
on the subject already in the systp.ii, are any of the major authors 'ho 
write on this topic known, are o*\\y certain academic or grade levels 
invoWed etc. 

Practice varies as tj whether or not the reference center requires the 
user to state the inquiry in the standardized language of the system. Some- 
times in order to save time and manpower, the user is asked, for example, to 
select terms representing the topic of interest from an authority list such 
as the ERIC Thesaurus . This is almost always dangerous in th<3t the user is 
not fully familiar with the vocabulary, the definitions of terms, the ways 
that they have been used in indexing, etc. Forcing him to use the authority 
list restricts him and, in effect, lessens the flow of information from 
user to searcher. Unless it is essential for economic reasons to make the 
user perform some of the search labors, it is much more effective to ask 
him to state his inquiry in narrative form in his own language. Encourage 
an uncensored and unlimited description. This provide* the searcher with the 
maximum raw mater i al/c 1 ues/ i ntel 1 i gence with which tr help solve the problem 
posed by the inquiry. 

If there is not voice contact between user and searcher, then obviously 
the inquiry is reduced to written form by the user before being subinitted. 
This may or may not be true if there is face-to-face or telephonic contact. 
In the latter case the phrase "search negotiation" can be particularly apt. 
As the searcher asks the user to state the problem, what is then said can 
trigger questions by the searcher. As specifications are identified, the 
searcher can immediately react with the user, informing him as to whether the 
system can handle that aspect and, if not, what alternatives exist. For 
example, the user may specify "6 year olds" in his question. The searcher 
may inform him that the terms EARLY CHILDHOOD (covering ^-6) and CHlLOHOOD 
(covering 7-12) are in use by the system and ask him which would be p'-eferable 
in this case. The user mav specify a disability, a grade level, and a 
curriculum area in his question. The searcher can determine which of these 
concepts is prime. If it is the disability, then the other factors should 
not be in a commanding and limiting position in the search. This kind of 



ERLC 



immediate, real-time, negotiation can clearly lead to great refinement of 
the question. The things the user thought were so obvious they didn't need 
stating are elicited by the ski 1 1 ful quest ioni ng of the searcher. The 
improved understanding of the request usually leads to more accurate 
strategy and a'user more satisfied with the end product. 



PAPER ^1 



ADVANTAGES OF COMPUTER SEARCHING Qy/ER MANUAL SEARCHING 



Tie purpose of this paper is to list some of the conditions that can alert 
a reference center to the possibility that a computer search to answer a 
particular query may be justified. Obviously there are many situations where 
a computer search is not justified and cannot compete in terms of time or cost 
with a simple straightforward manual approach. If someone is interested in what 
has entered the ERIC file on the subject of READING over the last quarter, the 
most efficient solution is the conventional one of goinn to the latest Issues 
of R I E (perhaps using the last cumulative index), looking i^ the indexes, and 
perhaps photocopying a few ;^ages. 

There are other situations, however, where the computer can add a dimension 
to a search not obtainable manually.: 



I . Mu] t i -Factoral Searches 



This is a search involving more concepts (and therefore terms) than can 
reasonably be held in the mind, much less manipulated logirally, while 
carrying out a manual search. For example, the intersection of three 
large "families" of terms, each involving perhaps 5-7 closely related 
terms, can result in a search requiring cognizance over a total of 20 
terms or more. Specifically, imagine that the patron is looking for 
material on the use of innovative teaching tools (e.g., Audiovisuals) in 
non-public schools, particularly in the smaller schools, such as Churc' 
schools. The strategizing could easily resutt in the following kind of 
three family intersection: 

PARAMETER A AND PARAMETER B AND PARAMETER C 



INSTRUCTIONAL AIDS 
OR 

INSTRUCTIONAL MATERIALS 
OR 

INSTRUCTIONAL MEDIA 
OP. 

INSTRUCTIONAL TELEVISION 
OR 

TEACHING MACHINES 



INSTRUCTIONAL INNOVATION 
OR 

INNOVATION 

OR 

EDUCATIONAL INNOVATION 



CATHOLIC SCHOOLS 
OR 

CATHOLIC ELEMENTARY SCHOOLS 
OR 

PAROCHIAL SCHOOLS 
OR 

PRIVATE SCHOOLS 
OR 

PROPRIETARY SCHOOLS 



The above search involves only 13 tefmc but one can readily see that to 
perform It manually would be Impractical, 

2. large Files 



Any request where the size of the file to be searched is ?n the tens or 
hundreds of thousands is a potential candidate for the employment of the 
comruter. The sheer clerical work involved in interrogating files of this 
si^J and recording the results argues for the use of that *'super-clerk" 
t'e computer. The search may be a simple one or two term search. What one 



ERIC 



is buying, therefore, is not logical or n u 1 1 i -f actoral capability so much as 
the sheer convenience of letting the computer do the scanning, the selecting, 
and the assembling into a nice convenient package of the output. 



3. Knowl»jdqe in New Patterns 

It has been argued that con, filter i zed searching will more and more become a 
tool for those working v/ith new configurations of knowledge. Berpjse it 
provides the capability ^or uoing highly complex searches, organized cn any 
of the ^'lelds in a record, upon massive quantities of data, the machine 
search will promote the extraction of information in new patterns, rather 
than being merely a quicker way of doing old things. It is thought that 
the computer aporoach will be able to detect the coincidence of concepts 
that even the indexer may not have realized at the time; or that the computer 
will be able to detect statistical patterns across numerous accessions that 
would have escaped the hunan inputters of data simply because they work one 
item at a time. This argument has been made particularly strongly for those 
systems dealing with natural text, e.g., a system analyzing the complete 
text of the Dead Sea Scrolls; or for those systems dealing with large 
amounts of numerical data, e.g., Census Tapes analyzers. 

Multiple Searches 

Volume alone may argue for the computer approach. If a center must fulfill 
the search requests of many pairons during essentially the same period of 
time, the speed and accuracy of the computer can become powerful allies. 
Obviously heavy demands and "cra^'h" demands are not new to service organiza- 
tions, but as the pressures rise to extend services and increase p:^oduct i vi ty , 
without adding staff, the computer may provide a way out. This can extend 
both to the primary situation where the need is for single copies c>f 
different c >ta (eg,, searches for 25 different professors preparing 
reading lists) and to the secondary situation where the need is ^or 
multiple copies of the same data (one search can be printed on multi-part 
paper or printed several times). 



ERIC 



TW1 M ODES OF SEARCHING : 

KETROSPECTiVE SEARCHING 
CURRENT AWARENESS 



Let us dbSume that a reference center has just acquired a new data base 
such as ERIC. The first item they received was '»e back-file of records as 
they existed at the time the order was processed Later they receive sr.aller 
sections, update tapes (either monthl/, quarterly, or annually) coming :n at 
regul ar i nterval s . 

Given this environment, we might project that the user would ask for 
one bij retrospective search initially, followed by regular current searches 
of the update tapes, as they arrive. 

1 . Retrospective Searching 

The retrospective search attempts to examine the entire data base 
compreh ^sively on a given topic. It is a large-scale effort usually 
done one Lime for any ; rticular patron or tooic. It requires careful 
search negotiation and coordination in order to be sjre that it is 
precisely what the patron wants and that It will achieve the desired 
degree of cuTipleteneiS. The care is necessary because retrospective 
searches are typically fairly costly and have large outputs. It is 
,':iudent to process ihem carefully and to avoid wasteful mis-steps. 

A typical problem in strategizing a retrospective search is to 
avoid excessive output. An ERIC search, for exarrple, is being run 
against a file of about 150,000 accessions. Sone of the more heavily 
posted terms have the ability to dunp as many as 7»C00 accessions in your 
lap, or S/c of the file. It is more likely that you are looking for no 
more than 150 hits (.1/). This means that your search must be tightly 
written and that it must take heed of the posting statistics for the terms 
that it is using. At the same time, as a comprehensive search, it must 
make certain that it utilizes all the terms that apply to the topic in 
question. Retrospective searches typically use a lot of terms and 
complicated logic, with intersections based on posting levels. 

2. Current Awareness 

In 1958, H. P. Luhn began to describe in the technical literature 
the pattern of service that came to be called first Selective dissemination 
of Information (SOI) and which now tends to be referred to as "Current 
Awareness" searching. This involves, quite simply, the periodic running 
of a customized search for a particular individual against the latest 
data available. The search itself did not have to be re-submitted by 
the user; rather it was kept on file at the reference center. it was 
carefully tailored to fit the user's needs and might even have some 
extremely idiosyncratic characteristics, such as parameters relating to 
the user as author, the journals he subscribes to, the laboratory in 



-1^ 



which he v^orks, etc. This so-called '^profile" of t-he user was regularly 
kept up-to-date by action of the center staff, the user himself, or both. 
Letting the user manage his awn profile can be dangerous, but has the 
advantage of letting him "play the game", thereby involving him intimately 
in the information system and feedback to it. 

The profiles are typically stored as a series of starches and run 
against incoming update tapes. Lancaster has made much of this by stating: 

"The principal distinction between SDI and retrospective 
searching systems is that in the case of the latter, a 
user request precipitat a search of the document file, 
whereas, in the former, a document precipitates a search 
of the uber f i le". 

The Current Awareness approach, using computers, increases the scale 
on which individually tailored services can be undertaken by a busy 
reference center. It ttlso permits many refinements in service. Perhaps 
its most important contribution, however, has been that it represents an 
act i ve dissemination of information, rather than a pass i ve response. 
Librarians have often been critized for being mere preservers of records 
but Current Awareness fits in with the more dynamic and modern role of 
beinv, specialists in the transmission of information to those who need 
it. Current Awareness takes the ir'^iat've rather than waiting for the 
user to come in tho door. 

A typical problem in strategizing ^ current awareness profile is to 
ensure that some output is achieved. If a proiile is run monthly against 
the ERIC data base. It Is searching only 1,000 - 1,300 records; if it 
is run quarterly, it is searching only around 3,500 records. Against 
sucn a small fraction of the ertlre data base it is necessary to structure 
a search ra^her loosely, in order to guarantee hits. Remember that even 
if the strategy werr to dump 57o of the file (a disaster in retrospective 
searching), in Current Awareness searching, against a monthly tape, this 
would invo1\3 only 50 hits, an easily digestible quantity. Current Awareness 
profiles usually, therefore, involve a lot of OR logic and few AND statements. 
Posting data is relatively unimportant when constructing profiles. It is 
definitely not appropriate to simply take a retrospective search on the 
same topic and use it against update tapes without modification. 



ERIC 



-II- 



III. MECHANICS OF SEARCHING 



ERIC 



MECHANICS OF SEARCHING 

A. General Principles of Good Searching 

>w the data base you are searching (what is in it; how 
was bui It ; etc.) . 

2. Know the search system capabilities available and how to 
use them most effectively* 

3. Follow good search negot i at ion procedures with the requester, 
e.g.: 

a. Purpose for which information is to be used. 

b. Type of search - retrospective or current awareness. 

c. Amount of information expected - new or old, general 
or specific. 

d. Kind of information wanted - research, bibliographies, etc. 

k. Use all reference tools available (including prior searches). 

5. Make use of all search capabilities wherever possible. 

6» Formulate strategy in terms of the user's request and 

expectations avoid personal biases of the information 

retrieval special ist. 

7. Evaluate output in terms of the original request. 

8. Obtain feedback from the user in order to be able to improve 
service. 

9. Keep statistics on user satisfaction, search results, etc., 
in order to improve service. 



Search Theory, and General Manipulative Capabilities 

BOOLEAN LOGIC 



BOOLEAN 

CONNECTOR/ 

OPERATOR 


SYMBOL 


ALGEBRAIC 
REPRESENTATION 


MEANING 


AND 


& 


A . B 

A & B 


Both A and B must be 
'true' or must 'occur' . 


OR 


1 


A + 8 
A 1 B 


E i ther A or B, or both, 
must be 'true' or must 
'occur ' . 


NOT 




A B 
A & B 
(A ^ B) 


A must be * true' or 
must 'occur' and B 
must be *not true* or 
must ' not occur ' . 



NOTE: In the above examples of symbols, the first version 

employs the traditional locf!-a» not ^I'^r while the second 
shows conventional typgraphical symbols that can be 
used on keyboards to input the desired logic to the 
computer (e.g., via card-punches, video terminals, 
magnetic tape typewriters, etc.). Remember that + 
equals logical OR, not logical AND, and that it is an 
inclusive "Oi< not an exclusive "OR". 



-14- 



VENN DIAGRAMS 




ASSUME: A = Poems 
B - Plays 

#1 - Poems and Plays (only materials indexed with both terms) 

#2 - Poems or^ Plays (all materials indexed with one or both terms) 

- Plays but not Poems (all materials Indexed with the term Plays 
excluding any indexed with the term Poems) 



TRUTH TABLES 



A 


B 


AND 


OR 


NOT 


0 


0 


0 


0 


0 


0 


1 


0 


1 


0 


1 


0 


0 


1 


1 


i 


1 


1 


1 


0 



0 I 



I 




















AND 




0 


AND 




0 




0 


OR 




0 


OR 




1 






NOT 




0 


NOT 




0 


A 




















AND 


S 


0 


AND 




1 




1 


OR 




1 


OR 




1 






NOT 




1 


NOT 




0 



V 



1 = True or Present 

0 = False or Not Present 



ERIC 



-16- 



PAPER ilk 



SEARCH SVMBQLOGY (OPERATORS, VENN DIAGRAMS. ETC.) 



There are a large number of symbols that have been used to represent 
the operations *hat search system designers wish to perform. There is little 
agreement on standard symbols even for the most common operators, the 
Boolean AND, CR, NOT operators. Because of this, many designers have 
preferred to use the words AND, OR, NOT rather than use symbols for them. 
Some use of arbitrary symbology is inevitable in any system, however, and 
the searcher must simply learn the language of the particular system he 
is involved with. In addition to the operators mentioned above, there may 
be symbols to indicate: 

(1) that a root or string is being searched and not a whole word (See "Text/ 
Str I ng Searching") ; 

(2) that the word must appear as a major Index term and not a minor (See 
"Major-M nor Index Terms"); 

(3) that two terms must appear adjacent to one another (See "Text/String 
Searching ') ; 

(4) that certain sub-files should be searched and not others; 

(5) that any N of X terms listed are sufficient to generate a hit; 

(6) that the output should be sorted in reverse chronological order (latest 
first); 

(7) that only a set number of hits should be printed out; 

(8) that the search should be saved In the system and be callable by 
instruction for future use, etc.; 

(9) that the data found should be greater than or less than a certain prefet 
value (see "Arithmetic Operators'") J 

(10) that the "hits" should be sorted in order of potential relevance (See 
"Weighted Searching"). 

Figure A, attached, is an attempt to display, in an easy to reference 
manner, the various ways that the common Boolean operators can be represented. 

Figure B, attached, is a comprehensive display of the representations 
of two terms in all possible logical combinations. 

Both figures can be useful references for the active searcher who is 
not mathematically oriented and may occaslonolly have to verify what he 
is doing. 



ERIC 



-17- 



FIGURE A 



1^ 



z 

o 

5 

o 

z 



< < 



01 



1 



CO 

z 

o 



< 
I- 
o 



o 



< 



3 
o 



z 

O 

y 




ii 



flQ OD CD 
♦ > 3 
< < < 



4) 

♦ > 3 



fld flCk CD CO CD 

< < < < < ^ 



• u ^ 

i) « 

a c 

w 
O « 

K • C 41 Z A. 



1 



T 



Be 
O 



ERLC 



!3 
o 

s 






s 



If- 



oa| o 
<l o 



^4 — CM 

1 

CO 0) 

u 

O h- 

c 



■ II 



o 



■ ■ ■ 

< CD "O 

o 

X 
4> 

CD X 

C 
(0 



2 15 

< «» 

UJ > 



1^ 

1 1 

O « 



c 

(A 

c a 



— O 



9 



=c 5 
u o 

2 ec 



X 



3 
O 



< 



O 

z 

H 

P 
O 

ui 

«A 
«A 
UI 



o o « 

Q UJ 



o J 
<g 

IT w t 

o 5 

5 Ui h 

|l 2 
^ 5 

U I p 
f O H 

0 B 

Z UI H 

gg I 

1 ^ B 

u 5 «fl 
!? J r 
- *^ V 

• ri 
J »** z 



o 
z 
< 

o 



1! 


Xwi 


0 


0 

As 


0 






in 


Lb 






























c ^ 

5|i 
It- 


1 


■ 


■ 

J 


■ 

} 




«^ 1-1 






9 


E 
















0 


■ 

0 


■ 

0 


■ 

0 


■ 

9 



ERIC 



FIGURE 13 



PAPER #5 



USE OF PARENTHESES (TO AVOID AMBIGUITY) 



Parentheses are a common way of indicating which terms in a search you 
want to have handled as one set. The ability to accept parentheses is a 
function of the software being used to access the file; it has nothing to do 
with the data itself. If a given search program does not accept parentheses, 
however, it either has to use a different symbol for the same purpose, or it 
has to have some convenliions built into it to tell it in what order it is 
going to handle the terms and operators in the formulated query. 

Parentheses remove the ambiguity that is otherwise present in a search 
equation. For example: 

EQUATION WITHOUT PARENTHESES POSSIBLE MEANINGS 
A OR B AND C 1 . (A OR B) AND C 



2. A OR (B AND C) 



As can be seen, whether the search program performed the OR operation 
first or the AND operation first would make a great deal of difference as to 
what data were retrieved. The searcher can avoid any problems by telling 
the computer speci f ical 1 v in what order the terms should be combined. If 
the searcher leaves out parentheses (or their equivalent) the search [^ogram 
must either: (1) reject the query, stating that not enough information has 
been provided to interpret it properly, or (2) process the query according 
to previously agreed upon conventions; the usual conventions are that the 
program processes NOT, AND, OR, in that order. 



GRAPHIC REPRESENTATION 




-2«- 



PROBLEM 



A OR B AND C OR D 
A OR B AND C NOT D 



These are ambiguous equations. Using 
the conventions referred to above, place 
parentheses around them to show how the 
computer would interpret them. 



SOLUTION 



1. A OR (B AND C) OR D 

2. (A OR (B AND C) ) AND NOT D 



Note that if the search system did not 
permit parentheses, but used conventions 
instead, the searcher has no way of forcing 
the computer to treat equation ^1 as 
(A OK B) AND (C OR D) and his searching 
is seriously restricted. 



Remember that parentheses are meant to make explicit, not to confuse. 
Multiple sets of parentheses may look formidable, but they actually make 
things easier to figure out. Just begin on the inside, treating the contents 
of a set of parentheses like the contents of a small box that can be put 
inside another box. Always run a quick check by counting up the number of 
left parentheses and see that they are equal to the number of right parentheses. 
The counts must be equal for the equation to be logically correct. 



Most search systems will reject a question in which the counts of left 
and right parentheses are not equal. 

Parentheses give you power to specify exactly what you want done. A 
search system without parentheses (or their equivalent) would be highly limited 
indeed. 



EXAMPLE: 




AND (X OR Y) 



-21- 



ERLC 



PAPER #6 



WEIGHTED SEARCHING 

Searching may be accomplished by assigning weight values to index terms 
and then insisting that a document acliieve a certain threshold weight value 
before It is considered a "hit*^ This approach fs in use in several search 
systems both domestic and foreign. It has never seriously threatened the 
popularity of the basic Boolean approach, however, it does have certain advantages 
that have appealed to particular system de?. igners. The two major advantages are 
as fol 1 ows : 

1. In a weighted search it is much easier to request that any N of X terms be 
present (e.g., any 2 of the 5 terms 1 istf,d) to constitute a '*hit*'; this 
specification can be difficult and laborious to code using Boolean 
operators . 

2. Weighted searching permits the searcher to arrange the output in order by 
weight value, thereby approximating an arrangement in order of relevance. 
In other words, weighting search terms injects a qualitative factor that 
the Boolean situation doesn't permit. Under Boolean operators, an item 
is either a '*hit'* or it isn't. Under weighted search conditions the 
resultant "hits" have varying weights and they can be arranged by these 
values. 

Even though we are not aware of any search system employing weights that 
is currently accessing the ERIC data base, it is possible that this may change 
in the future. It may also be useful to ERIC searchers to understand the 
pros and cons of the weighted approach and how it relates to the Boolean 
approach . 

The attached "Brief Communication" by a Facility staff member was 
originally prepared in 1966| however, we believe it still conveys the basic 
information that anyone contemplating the weighted search approach should 
know. For those who may wish to probe more deeply into the topic the most 
complete treatment yet prepared is: Pauline Angione's "On the Equivalence 
of Boolean and Weighted Searching Based on the Convertibility of Query Forms", 
M. A. Dissertation, Univ. of Chicago Graduate Library School, August 1968, 
50 p. 



ERIC 



Simulation of Boolean Logic 
Constraints Through the Use 
of Term Weights 

The evolution described Uriow of one aspcet of the NASA 
Scientific and Technical Information Facility's machine 
search systerti may be of general intrrrst to the doeumrnta- 
tion profespion. 

The Facility began operations in early 1962. The litera- 
ture search seri'ice, or ^'(Icniand bibliography" service, as it 
was then termed, was initially a very modest endeavor for 
the simple reason that the data base upon which to search 
had yrt to he btjilt. The first M-an li jjro^ranis eonccntrntofl 
on the wrll-known Houlran lo^iic capahilities in the scarcli- 
in« of inverted tenn fi!r>) on ina^inrtic tape.-s. This w;i,s- 
con.sistent with-tlio eonlrarlor's (OatunicntiUion Incor- 
porated) prior JIA'D cxpcrirmo witli so-caJIcd ''rnilonus ' 
und coordinnte indexing .<y.st(;ni^. 

A major change was cfft'ctfd, beginning in January 1965, 
to n serial or linear type of file organization. The reasons 
for thi.s eJiangc were many and varit'd and necrl not concfrn 
us in anv di-'laiJ here. They involved, primarily, enieiencifs 
in the fde niainlnnance nnd iiiKlate prorcihircs and in the 
journal index preparation procrdnres. Also, it ^vas becoming 
imperative to he able to scarcli {lie (ih* on a variety of non- 
subject, adnunislrnf ive eatogorie.* of information. At tho 
time of this cliangr, additional capahililie.5 >vrre built into 
the new 'linoar" ^i^'arch sysfrn^. To sujiplement tlto. basir: 
Boolean ^-apjbility. we no\v» among other thing?, made 
available to ourselves the following stralrj;ies thai were 
well known in the .state-of-the-art: (1) a weiKhling tech- 
nique, (2) :i *'root" searching technique, and (3) a syc^tem 
of nonsubject "limits." 

The weighting tcchnifpie permits the assignment of 
arbitrary weight values to st'orch levm.s and the specification 
of a niininnmi weight which any {locuiuent mu.st achieve in 
order to become a "hill." 

"Root" .searching picrmits queries on any desired grnerie 
level of various entitiies. p.g., all contracts with the )iretis 
NAS8-; all report numbers with the prefix . 11.AK-; all 
authoi-s with names beginning CAR-. It may soon be ex- 
tended to index terms, as in all terms beginning 
''PyVXMOr ete, ■ 

The system of "limitis" permits the speeificalion of v.nriouf 
additional constraints on a search othor than those involving 
.Mibject index terms. Nearly all the standard descriiitive 
ratrdoging elements fall within this system. 

Each of these new capabilities has seen a great deal of 
u.<!e. The weighting teelmiQue. however, has particularly 
caMght the interact of the .seareJung stafl* and has .vesultcrd 
in some far-reaching devclopnjents . 

For instance, it is apparent that fiocumcnt weight I>o- 
come.s a way of ranking seareh output in order of relevante. 
Probably the first use that weights were put to within the 
Facility was not to limit the output — the Roolean equation 
did this — but to arrotigc it for either the user or the analyst 
or perhaps both. This became extremely valuable in an 
environment where search output received a human edit 
befors it was released. Arbitrary weight levels could be set 
by the analyst above which relevance to "the question was 
assumed and below which his editorial effort was concen- 
trated. 

It also beearne apparent that the weighting technique 
could, by itself in some .situations, achieve exactly the .snme 
fr,.su-ts as a Boolean equation; cleverly assigned weight--^ 
coukI ^.imuhtc such an equation. For example, the equation 
(1) A<E 4- C 4- = Answer, can be completely b> passed 
through t lie following weight as:-ignnjents; A=3, Br=l. 
C = 1, 2> = 1 ; Weight Limit =4. This becomes vcn* tt.seful 
to know, for the calculation of weights waA a much faster 
computer nroeess than the solving of a Boolean equation, 
and the substitution could lead to significant computer time 
savings. Other common types of substitutions were tlie 
following: 



(2) A + B + C + D 

(3) A B C D 

(4) A+(B-C D) 

(6) (A4-B) + (C D) 
(6) (A4-B)-(C D) 



ERLC 



A = 1,B = 1,C = 1.D:=1 
Weight Limit = 1 

A = 1,B = 1,C = 1,D = 1 
.Weight Limit = 4 

A = 3,B=i,C = I,D = l' 
Weight Limit = 3 

A=2.B=2.C = 1,D = 1 
Weight Limit = 2 

A = 1,B = 1, C = 2,D=:2 
Weight Limit = 5 



Various ndes of thumb can easily be developed, and were, 
for the proper assignment of weights in more complex 
situations of the above basic types. However, t'o m:itbe- 
matical formalization was ever attempted. 

It was soon realized that though term weighting had ite 
advantages, nevertheless there wore some equations that 
coidd not be reduced in this way. Two of the most basic 
are the following: 

(7) (A + B)-(C-hD) 

(8) (A-B) + (C-D) 

The above equations cannot be simulated through any 
a.s^igninent to tlieir terms of positive 9r negative weights, 
in conjunction with a weight limit. This can be proved by 
fairly simple algebraic techniques which will not be gone 
into here. 

Oontimung examination of the recalcitrant situations led 
to the development of a special "Group Weight^' system for 
processing them. Essentially this involves "nudtiplying out" 
tJie equation, identifying its sections or groups, and assign- 
ing weights and weight limits for each section. Equation (7) 
thus becomes the-rcdundant (7A) A(C + D'/ B(C + D) 
and weights may be assigned as follows: 



Group A: A(C-uD) 
Group B: B(C-hD) 



Ar=3,C = l,D = l; 
Weight Limit = 4 

B=:3,C = l.Di=l; 
AVeight Limit = 4 



The search program is now in the process of being 
changed to permit this technique. Ix)gieal equations will be 
made an optional, not a mandatorj*, feature of a search 
question. All types of logical equations may then be con- 
verted solely to a system of term weights and weight limits. 
Te.?tL: ha\'e been run comparing search times for ten prob- 
lerns coded by equation against the same ton coded with 
weights; both sets being run on our IBM-1410 search sys- 
tem against the same single reel of the data base. Results 
indicate that there is a 4 to 1 time a<lvantage to nmniug 
in the weight or arithmetic mode. However, it is clear that 
complicated equations can b<» both dillicult and laborious to 
code. The nexi steji is therefore <»bvious. In thase eases 
where weights would be u.<od mainly to sinmlate Boolean 
logic for the sake of proccs.^ing speed, thiTc is no ren.son that 
the program .shoidd not accept the equation uui calculate 
itH own weight as-signmcnt*:. This is now being evaluated. 

It is thought that this particular ca.so history in the use 
of weights may be of interest because of the widespread 
current use of weights in machine search .system. s. Several 
systems .seem to be drojiping the Boolean capability per se 
altogether in favor of weights. The two are generally .<:poken 
of in these sitnations as disparate entities. It is not that 
simple. The elcseness of the relationship is shown by the 
fact that the weighting technique can be' made to simidate 
Boolean logic. However, in doing so, Ihe weighting tech- 
nique can easily become too didlcult for convenient human 
use. On the other hand, th? logical equation is perhaps the 
most unambiguous and easily eoinprehensiblo way a search 
question with a eomplei; relationship of terms can be 
organized and displayed. Our own solution is to keep both 
strategies in order to fake a<lvanlage of the umque ciipabili- 
ties that each has to offer. At the same time, we are 
attempting to take advantage of the neAvIy realized (at least 
as far as we are concerned) relationship* between the two 
systems by utilizing the fast weight calculation process a^s a 
technique fo! internal computer solving of a logical 
equation. 

W. T. Brandhorst 

Ansisiant Director for Operations 

XASA Scieniific and Taknical 

Injormatian Facility 
and 

Documentation Incorporated 
Bethesda, hfaryland 



AMTHM E T I C OPERAT 0 K S 



Arithmetic Operators are useful in tiie searching of data bases that 
have numerical fields or comporkents, e.g., Census Tapes. These operators 
instruct the computer to proceed by making simple aritnmetic tests of the 
data fields examined. Usucilly the test soecifies that a certain range of 
data must be found in the field ratner than specifying a specific number. 

The common Arithmetic Operators are: 

POSSIBLt 

OPERATOR SYMBOL ALPHABETIC RtlPRESENTAT I ON 

Equal To E(i 

Greater Than ^ GT 

Less Than LT 

Greater Than or Equal To ^ GE 

Less Than or Equal To 1 LE 

Note that NOT logic :n this environment can be handled by asking for 
the reverse condition. For example, if you wish to eliminate all hits 
with a publication date earlier than i960 (NOT < I960), this is equivalent 
to specifying that all hits have a publication date of Greater Than or 
Equal To I960, i.e., > I960 is the reverse of < I960. 

Arithmetic Operators are not usually found in search systems designed 
to access bibliographic data. The reason is, of course, that the data 
elements found in such systems do not lend themselves to this type of 
handling. It must be stressed, however, that the availability of Arithmetic 
operators to a searcher is a function of the software at one's disposal; 
it is not a function jf the data file. 



PAPER #8 



TEXT/STRING SEARCHING 



Most search systems, and certainly most search systems-'operat f ng 
against the ERIC data base, reiy on searching the Descriptor and Identifier 
fields for subject access to the file. 

There are systems, however, v;hich are designed to treat literally every 
.word. of the total record as a potential access point. These are generally 
called "full text" retrieval systems and they rely on '.'string searching" 
approaches. The '^ful 1 text" usually refers to the full text of whatever 
is input as a record 'and not the full text of the actual document (v/hich 
would be very expensive to key and store). What one finds, therefore. Is 
that most "full text" systems are operating against the words in the title 
and abstract fields, as well as any indexing term field there may be. 
(Exceptions to this occur in the legal fields where the search may be against 
the actual full text of a statute). 

Because such systems operate on natural . unstandard ized language, they 
are faced with all the problems caused by different endings and word fbrnis, 
e.g., steal , steal s , stealing, stealer, stol en , stol e* This is why they 
generally provide string searching capabilities, permitting the searcher 
to specify given sequences of characters no matter where they appear, 
e.g., the root or string STEAL, no matter ivhat ending it might have. They 
also take advantage of the fact that topics written about in close proximity 
to one another will generally be related. This is done by permitting the 
searcher to specify that A and B must appear in the same sentence for the 
item to be a hit; or they must appear within two words of each other, e.g., 
ail items where INFORMATION and RETRIEVAL appear within two words of 
each other, as in "Information Storage and Retrieval". This is often 
called an "adjacency" capability. 

On the theory that words mentioned early in an abstract are more 
important than words mentioned later on, some full text systems provide the 
capability of specifying that the terms must c^ppear in the first so many 
sentences (or the first 50%) of the abstract. - 

The argument as to whether retrieval is better when relying on 
standardized index terms assigned by human indexers, or whether it is better 
when relying on the natural untouched text of the ^i tern itself, is sometimes 
called "The Great Debate" in information retrieval work. 



ERIC 



PAPER #9 



NOT LOGIC 



Negation, or the exclusion of i 
property, is worth a short write-up 
or mi s-appl ied. Some searchers are 
of their armamentarium; others use I 
they might be missing as a result. 



tems because they have a particular 
because it is oflen either mi sur:<jers tood 
afraid to use it and never make it part 
t too much without realizing hc" fnuch 



NOT logic usually has its own symbol and takes precedence in the 
hierarchy of machine operations. If a negative oper?3tor is interspersed in 
a logical equation with other operators, you can expect it to function firi^t 
and most rest r i ct i vel y . 



I n other words ; 



Equat ion 
A OR B NOT C 



Possible 
I nterpretat ions 
(A OR B) NOT C 
A OR (B NOT C) 



Ko rma I 

Mach i ne 
I nterpretat ion 
(A OR B) NOT C 





(A OR 8) NOT C 
MOST RESTRICTIVE 



A OR (B NOT C) 
LEAST RESTRICTIVE 



Even the most enthusiastic users of NOT logic admit that it can be a 
highly restrictive tool, often eliminating the good with the bad. Some 
recommend using it only after one search has already been done without it; 
using it then to eliminate known irrelevancies. Othere recommend whet is 
essentially the sanre thing, that the set of items being eliminated be 
examined to see just what is being lost. Both of these recommendations are 
recognitions of the fact that an item may meet every one of your positive 
specifications, but if It contains the single parameter negated, It can be 
excluded from the final printout. 



ERLC 



24- 



It is of interest to note that NOT is short for AND NOT, not OR NOT: 




ERIC 



PAPER #10 



DATA ELEMENTS AVAILABLE FOR SEARCHING 



The machine-readable bibliographic records being manipulated, searched, 
and retrieved by various search systems, usually contain much more than just 
the subject index terms that tend to be concentrated on during searrhing. 
This is certainly true of ERIC and it holds true generally for the otSer 
major data bases as well. 

The existence of these other data should not be forgotten. It is surprising 
how frequently they can be put to use with advantage in a subject search if the 
software being used permits it. In a search on BEHAVIOR MODIFICATION, for 
instance, surely it would pay to examine the works authored by B. F. Skinner. 
A search on ARTIFICJAL INTELLIGENCE might well add as a significant parameter 
the Institution (loboratory) by this name at the Massachusetts Institute of 
Technology. A search involving some aspect of JUNIOR COLLEGES might prefer 
to limit itself to input from the ERIC Clearinghouse on Junior Colleges in order 
to ensure high relevance for its subject search. A search on SPACE SCIENCES 
educat ion m ight like to restrict itself to items having Report Numbers beginning 
NASA-EP, in order to pick up atl the NASA educational publications in the ERIC 
system. 

The above are just a few of the ways that non-subject data elements might 
come into play in what is basically a subject search. The ability to access 
these data depends completely on the software system being used. If the 
search system is a linear or sequential search against the ERIC Master Files, 
for instance, then it is necessciy to pass all the data by the reading heads 
of the tape drives and the chances are that the system provides (or can be 
easily modified to provide) access to either or both non-subject and subject 
fields. If the search system first queries an inverted index file to determine 
the accession numbers of the "hits" that satisfy the specified conditions, 
then the first pass can involve only those data elements for which index files 
exist. This automatically excludes many of the non-subject data elernents. 
However, in the second phase of such systems, it is necessary for them to go 
to the Master File and extract the full records for the "hits". It is sometimes 
possible to apply non-subject restrictions at this stage of the process, after 
there has already been a winnowing down on the basis of subject. 

A complete list of ERIC data elements available for searching appears as 
Figure 1 . 



ERIC 



O £ O JL 
o < o < 
O Z U z 



o r 

o < 
o z 



tel UJ UJ UJ 

O «t o < 



' (N 0% 00 r<4 ^ • 



< < <^ 

ttc ac ae. 
Q. a. a. 



to C L_ ^ 

a> o O O > 
< < z < < 



f UJ l-l U (.> O o 



< S 5^ SI 

" — Ozo — OOUJaec] 
Zi_i<_>i_i ZZOO-UJi 



u.>-335 UJ Odcoz Ov^zuSdc <ude Zococop sujom 
deki.ki.ki.zOZO — — z — -30<a.&.a.a.£a.o>a.£o. 



1>ZC OJ OJZS^-J 



c o c H> » 



.2 S 



— E O ; 

— 3 ^ x; 

— Z Q CT' 



« i/» 4 i/i V 3 W - 



C C > k. «; 

._ ._ ._ O U 

« (t Q. a. «- ■ 



I TJ D k. « _ 
I TJ > 3 -C — 

: < < a i_i u < 



w 2^ & U U 




1j < o a z o _ — — ■- 



■a X) 

- 4j . 



C tl 41 V 



tl k. k h. 



0 — . 
o ^ , 



■ — ■- c a Q. 3 c c 
iii«i«i4j«j«jaa. 



• — 

— ^ 

•- o 



o a o o o o 
o o o o o a 
u o c* u o o 



u § « 



r* oo (Tk O • 



> J vTtsO r*.0O O^O ■ 



^J U\»J 1^00 - 



rsi J (J^ 00 ^ O - 



ac dc oe oe oc oe 



a. < < 

>■ o o _ _ 
djO«J ^ -jzh>z k/vzoouzotei a ae. h-ZH> 

ccc«-> xtt oo<-ki.h>^ut<rozpzz«/>uj>-h>o — <o«^>^£z <^rM<D3ij <cccccce 

0 0 0*~>^l-'3<«feCC30>- — 3ZKUJ — & ?3UiaKOtfa'UJOt^aiUlO«CZ»ZX3aiUZO 0 O 3 oooc 

zzz<i->oo.a.o.&o.»-ki.h><— aaook/vu.ki.a— &za.a.a:<: — <aeoOtt<'^ujo>ui<z:o>ci.zzzz2zz 



? f S ? S S i 



I -o o 



ti a o> i/i k. ^ - 

3 C II (« 4* — 

cr U )l £ ^ 

1> T3 £ O — ^3 

ui^ < o < <j o a> I 



ft, ■ 



C — ' 



— c « c « 



II 



•.^ R ^ 



C 3 >~> ^ 
11 >''Q O tl 4) 
0» O - Z o > 
4 C > — C k. 

C<COt»0 — O— »9 


>- 0 £ k. 

- ^ ^ w ~ u Ov 
tt£v4l>>OZiM £i V 
£3U^-'^ C 
EZOE---CJO 3'« 
3 €"3— LJU -C 


O C « — — Q. — O - v» 
Vi-~tt>>o' »-OU'D4l 

a.93tt'Qa4it- w --^ 
Lnu.w4n — uja&a.u.< 


U UZ A— 0*B - 
Ofc-O 3a<1— Ui>^CUU 
Ck.k.v^O'- Ck. -^OMt) 
3 O C 41 •- w 41 — X/ - — 
^^CC<0k.'93D.J33^O0 
«/i^C3k. 3>0 K 3"OOi-k. 



II 

':i 



._ i3 c «t n 

^ .- X ^. - z 

3 ^ a «Q ^ O" 

^ c t u o c c 



c 



O 3 C C> 
tl a CL 3 ftJ 



ERIC 



PAPER /'ll 



LEVELS OF GENERALITY AND SPECIFICITY 



It is a generally established practice, in systems (like ERIC) employing 
coordinate indexing principles, that documents should be indexed at the level 
o" specificity of the document in hand. In other words the most accurate verm 
in the Thesaurus that represents the concept covered by the document should be 
selec:ed, not a term higher or lower in the hierarchy. For example, if a 
document deals with HANDICAPPED CHILDREN then th^t term should be selected 
rather than the broader term CHILDREN (perhaps coordinated with HANDICAPPED). 
However, if the docui^ient referes to all kinds of children, and handicapped 
children do not stand out as a distinct topic, then the broad term CHILDREN 
would be most appropr'ate. If the document treats both children generally 
and handicapped children specifically, then both CHILDREN and HANDICAPPED 
CHILDREN are appropriate Descriptors, even though both are in the same generic 
t ree . 

There are some systems which practice "automatic posting up". In other 
words, if a document is indexed at some middle point in a generic tree, such as 
INTELLIGENCE TESTS, it will also, as a matter of course, be indexed by its 
Broader Term TESTS. When this practice is followed, unless the indexing is 
tagged in some way, there is no way to distinguish between general materials 
on TESTS and more specific materials on INTELLIGENCE TESTS which have also 
been indexed to TESTS. The practice of posting solely to the levels actually 
dealt with by the document has the advantage of permitting the searcher to 
zero in with greater accuracy on the topic desired by the user. Conversely, 
however, it means that if the searcher is interested in retrieving at al 1 
levels of a given topic, it is necessary to include not only the broad generic 
term covering the area, but also the many specific terms lower in the tree. 
This can sometimes present the searcher with an onerous coding task. For 
example, under the term AFRICAN LANGUAGES in the ERIC Thesaurus , there are 
over 30 specific languages, such as SWAHILI. If a searcher is interested in 
everything the system has on African languages, whether general or specific, 
he must code all 30 terms into the search. Sometimes sophisticated search 
systems avoid this problem by permitting the searcher to specify a given term, 
plus all terms narrower to it . In other words, with one search instruction 
the searcher could pick up aH the specific African languages without having 
to write each one down. 

It is important to the searcher to be aware of indexing practice in this 
area. Let us assume, for example, the following hypothetical indexing situations: 

1. Document discusses a general concept (e.g., SALARIES) but illustrates 
profusely from a narrower class (e.g., TEACHER SALARIES). Both are selected 
by the inoexer, 

2. Document concentrates on a specific concept (e.g., PSYCHOL I NGU I ST I CS) , 

but the indexer thinks the treatment is such that it adds useful information 
to the body of knowledge about the more general concept (e.g., LINGUISTICS); 
both terms are used. 



ERLC 



3. Document discusses many specific concepts (e.g., NURSES, PHYSICAL THERAPISTS, 
DENTAL HYGIENISTS, etc.), but none in sufficient detail to merit the indexing 
of each specific concept. Instead, the generic term HEALTH OCCUPATIONS is 
used. 

k. Document provides detailed treatment of several types of Agricultural 

Personnel (e.g., EXTE»\ISION AGENTS, AGRICULTURAL LABORERS, FARM MECHANICS, 
SHARECROPPERS, FORESTRY AIDES, AGRICULTURAL TECHNICIANS, etc.). In the 
judgment of the inde>.er there is sufficient data on each to warrant indexing 
each specific occupa': ional group. In addition, because there are so many 
groups involved, the general AGRICULTURAL PERSONNEL is used. (If only two 
or three types had teen treated, the generic term would probably not have 
been appropriate). 

5. Document deals solely with a specific test called the "Detroit Advanced 
Intelligence Test". The indexer thinks the document should be made 
accessible via Descriptor (as well as the specific Identifier) and chooses 
the "reasonable" level INTELLIGENCE TESTS (not TESTS). 

6. Document is a comprehensive treatment of SUICIDE among all classes of 
people, including STUDENTS. The slant is specifically SUICIDE and therefor*? 
the Broader Term DEATH is not used. 

All of the above solutions are justified under the ERIC guideline to index 
to the specific topic dealt with by the document. As can be seen, the ^ndexer 
is given great discretion to interpret the subleties and emphases of the 
document. The searcher must be aware of the possibilities both in order to 
search effectively and to interpret search results. 



-31- 



PAPER lf\2 



MAJOR-MINOR INDEX TERMS 



At the time it is used to index a document or article In the ERIC system, 
every Descriptor or Identifier is identified as representing either a "Major" 
concept in the document or a "Minor" concept. Vocabulary terms are not, 
therefore, major or minor in themselves, but only as they are applied in a 
given situation. For example, a document dealing basically with NURSERY SCHOOLS 
may touch peripherally on TOYS, as one factor to consider. -NURSERY SCHOOLS 
is, therefore, considered the "Major" concept and is identified as such by 
being tagged with an asterisk, as shown. TOYS, given much lighter treatment. 
Is considered a "Minor" concept and is identified as such by the absence of 
an asterisk. 

The following table shows the average number of Descriptors and Identifiers 
assigned to each RIt and CUE accession and the proportion of these that are 
identified as Major and Minor: 





Rl£ 


CUE 


Average Total Number of Descriptors 
Assigned to Each Accession 


11.35 




6.61 




Major Descriptors 








3.88 


Minor Descriptors 




e.kk 




2.73 


Average Total Number of Identifiers 
Assigned to Each Accession 


.97 




.39 




Major Identifiers 




.18 




.17 


Minor Identifiers 




.79 




.22 


Average Total Number of Index Terms 
(Both Descriptors and Identifiers) 
Assigned to Each Accession 


12.32 




7.00 




Major Term 




5.09 




k.05 


Minor Term 




7.23 




2.95 



ERIC 



This practice of distinguishing between Major 
terms serves two principal functions; 



and Minor index 



1 . It Limits the Size of the Published Subject Index 

In order to provide indexing in depth of all concepts covered 
significantly by an accession, an average of 12.32 total terms are 
assigned to each RIE accession and an average of 7.00 total terms 
are assigned to each CUE accession. At the present time only the 
Major terms are permitted to appear in the published SuJoject Indexes. 
If all of the terms were permitted to appear, these indexes would 
be over twice their present size. Can you imagine an RIE Annual 
I ndex twice its present size? This would be impractical from a 
publishing and economic standpoint. The Major-Minor dichotomy 
permits the ERIC system to have the benefits of both in-depth 'ndexing 
together with practical, reasonably large, published subject indexes. 

2 . It Permits Searchers To Go After Higr. ■Recall or High Relevance 
(Precision) 

If a searcher is interested in comprehensiveness, in getting 
everything in the system that touches on a subject, he can search 
on all the appearances of a term, without regard for Mojor or Minor. 
On the other hand, if the searcher wants only material that devotes 
itself heavily to the topic in question, he can restrict the search 
to the asterisked appearances of the term involved. 

If the indexers had not made the Major-Minor distinction at 
input time, all the index terms would be on the same footing and 
the searcher would not be able to tell the key 'ejects from the 
per i pheral subjects . 



ERIC 



•35- 



PAPER #13 



IDENTIFIERS 



There are two types of indexing terms used in the ERIC system: 
Descriptors and Identifiers. Descriptors are tightly controlled, defined, 
and cross-referenced, and appear in the Thesaurus of ERIC Descriptors . 
They represent relatively well-known subject matter concepts such as 
ANTHROPOLOGY, NURSERY SCHOOLS, TEACHING MACHINES, etc. Identifiers represent 
virtually anything else that an indexer might like to subject index a document 
by. The Identifier field is meant to be a very open and unconstrained field 
giving the indexer great freedom and nearly complete discretion to include 
index access points that are deemed useful to the user. 

Identifiers are, in almost all cases, the names of specific entities. 
As there is a nearly infinite number of specific entities, it is not appropriate 
to burden a thesaurus with such a multiplicity of entries. Also, Identifiers, 
being so specific and often transitory, may be represented in the literature 
very infrequently; this fact also argues for separate treatment. 

The major purpose of Identifiers is to provide additional indexing depth> 
of a specialized nature, supplementing that provided by Descriptors. Identifiers 
may be specific projects, geographic locations, persons, trade names, tests, 
legislation, organizations, equipment, etc. It is also possible to use the 
Identifier category as a testing ground for a term whose permanence may be in 
some doubt. If the term demonstrates over time its acceptability by the 
profession, it may graduate from Identifier to Descriptor status, e.g., 
Coiiiputer Assisted Instruction (CAI). Identifiers are not defined (scoped), 
cross-referenced, structured (related to one another), or otherwise subjected 
to lexicographic analysis. In order to aid retrieval, however, it is necessary 
to observe certain standards in their construction and to see that the more 
frequently used ones appear in the file in a uniform format. The ERIC Processing 
Manual includes, as Appendix G, a list of the more heavily used Identifiers, in 
their preferred forrpat. 

The following is a list of the major categories of Identifiers, together 
with an example of each: 



CATEGORY 



EXAMPLE 



Acronyms 

Coi ned Terminology 

Conferences/Meet i ngs/Sem i nar s/Sympos i a 

Equi pment 

Ethnic Groups/Tribes 

Geographic Locations. 

Leg is 1 at ion 

Methods and Theories 



PERT 

Sesame Street 

National Reading Conference 



Auto tutor 
Shoshones 
New York City 
Taft Hartley Act 
Montessor i Method 



ERIC 



Organi zat ions 



Community Organizations Los Angeles Chamber of Commerce 

Educational Organizations Parent Teacher Association 

Foundations Ford Foundation 

Government Agencies National Institute of Education 

Industr i al Organi zat ions West ing house Corporat ion 

School Districts Milford Kansas School District 

Personal Names Skinner (B F) 

Projects Project Talent 

Tests and Testing Programs Scholastic Aptitude Test 

Textbooks Uralic and Altaic Series 

Trade Names Erictapes 

Note that Identifiers, like Descriptors, are tagged at indexing time 
as representing Major or Minor concepts in the document being processed. 

The following notice, which appeared in Interchange ^3 illustrates well 
how Identifiers can play a large role in a search, supplementing the Descriptors, 



BRITISH INFANT SCHOOL - 
SEARCH STRATEGY 

Carolyn Trohoski of RISE writes that a search for 
material in the ERIC system on the subject of the British 
Infant Schools requires the use of numerous Identifiers 
as well as Descriptors. The terms she used in her search, 
and that she fmds are worth passing on to others, are 
shown in the table below. If it is desired to limit output 
solely to actual British references to these schools, as 
opposed to U. S. applications of the same principals, the 
searcher should intersect with the geographic Identifiers: 
ENGLAND or GREAT BRITAIN or UNITED 
KINGDOM. 





DESC- 


IDEN- 








TERIVI RIPTOR 


TIFIER 


RIE 


CIJl 




British Infant School 


X 


X 


X 


2. 


British Infant School Theory 


X 




X 


3 


British Infant Schools 


X 


X 


X 


4. 


British Primary Schools 


X 




X 


5. 


Informal British Infant Schools 


X 


X 




6. 


Informal British Schools 


X 




X 


7. 


Infant Schools 


X 


X 


X 


8. 


Leicestershire Infant Schools 


X 


X 




9. 


Open Classrooms 


X 




X 


10. 


Open Education x 


X 


X 


X 


11. 


Open Education Model 


X 




X 


12. 


Open Education System 


X 


X 




13. 


Open Plan Schools x 




X 


X 


14 


Open School 


X 


X 




15. 


Open Schools 


X 


X 


X 



ERIC 



PAPER #1^ 



IMPORTANCE OF KNOWING DESCRIPTOR FREQUENCY (POSTING) STATISTICS 



In most Irformation storage and retrieval systems the subject index 
terms display an enormous variation in the frequency with which they are 
used. In the ERIC system, for example, there is one term (INSTRUCTIONAL 
MATERIALS) that has been used over ^,000 times. There are k terms that 
have been used over 3,000 times. On the other hand, there are 136 
terms that ha^e been used only once. The attached Figure A gives some 
indication of the spread of the terms over the various usage levels. 

It is absolutely essential for a searcher to know the tjsage levels 
fo'- the terms being used in a search. It is possible to mismatch Descriptors 
so that the possibility of there being any hits becomes very poor. For 
example. Figure B depicts a situation where there is a total file amounting 
to 100,000 references. Term A has been used 1,000 times. Term B has been 
useci 500 times; Term C has been used 5 times. If the assumption is made 
that the usages of these terms are equally likely to be scattered across 
any item in the file then the chances of there being an item containing 
both A and B is the multiplication of their separate probabilities, i.e.. 

Chance of A appearing is 1 ,000 = 1 

100,000 100 

Chance v : B appearing «s 500 = I 

100,000 200 

Chance of both A and B appearing is 1 X 1 = 1 

100 200 20,000 

With a probability of 1 and a file size of 100,000, the anticipated 

20,000 

number o^' hits would be 1 X 100,000 = 5. 

20,000 1 

However, to intersect A and B and C would be to decrease the probability 
of any hi "s to essentially zero. For example: 

Chance of A = 1 
100 

Chance of B = 1 
200 

Chance of C = 5 = 1 

100,000 20,000 

Chance of A and B and C = _J X J_ X 1 = 1 

100 200 20,000 400,000,000 



Q As of June 1972 

ERIC 



Because of the mismatch between he usage level of C and the other two 
terms, the chances of a hit involving A and B and C are one in ^00 million, 
or essentially no chance at all. C should not be intersected with any other 
term; its postings should probably be examined in their entirety. 

Any examination into the probabilities involved in information retrieval 
makes one realize rather quickly that the very broad or general terms, with 
a high frequency of^ postings, that many people are fond of saying are of no 
use for retrieval, do indeed have their value in the context of machine 
searching. If indexers over a period of years have used INSTRUCTIONAL 
MATERIALS over 4,000 times, it is plan that this is one of the central topics 
appearing in the ERIC literature. By intersecting such a term with other, 
less frequent terms, the term can definitely serve as a filter and its high 
volume of postings is not necessarily a liability when the comparisons are 
being done by a high speed computer rather than a human being. 




-31- 



ERIC DESCRIPTOR USAGE 
(Distribution of Postings by Various Ranges) 



Range of Postings 


1 1970 Uune) 


(1971 (June) 


1972 (June) 


Number 
of Terms 


Percentage 


Number 
of Terms 


Percentage 


Number 
of Terms 


Percentage 


1 


229 


5.08 


184 


3.87 


136 


2.81 


2-3 


1 337 


7.48 


301 


6.33 


250 


5.16 


^ 1 

4-9 1 


1 626 


13.89 


570 


11.93 


522 


10,77 


10 • 49 > 

>10-99 
50 • 99 J 1 


2,403 


53.30 


1632; 

; 2410 
7781 


34.30 t 

! 50.65 

16.35 ; 


1608 1 

S2392 
784! 


33.1 8 1 

149.36 
1 

16.18! 


100-199 1 


534 


11.35 


651 


13.68 1 725 


14.96 


200 • 299 


205 


4.55 


272 


5.72 


319 


6.58 


300 • 399 


74 


1.64 


143 


3.01 


173 


3.57 


400 > 499 


32 


0.71 


70 


1.47 


106 


2.19 


500 ' 599 


23 


0.51 


58 


1.22 


57 


1.18 


6(10 >C;ftf 


10 


0.22 


20 


0.42 


52 


1.07 


700 • 799 


7 


0.16 


20 


0.42 


24 


0.49 


800 • 899 


6 


0.13 


13 


0.27 


20 


0.41 


800 ' 999 


5 


0.11 


3 


0.06 


10 


U.21 


1.000-1.999 


15 


0.33 


33 


0.69 


4u 


0.93 


2.000 - 2.999 


2 


0.04 


9 


0.19 


10 


0.21 


3.000 • 3.999 


0 


0 


1 


0.02 


4 


0.08 


4.000 ^ I 


0 


0 


0 


0 


1 


0.02 


Totals 


4,503 


100.00 


4,768 


100.00 


4,846 


100.00 



ERIC 



FIGURE A 
-31- 



TOTAL 
FILE 
100,000 
REFERENCE, 



Probability of each term appearing in record: 



1 .000 
lOO.OOC 



1 500 = J_ 

100 100,000 200 



I 



100,000 20,000 





nocuments i" .exed by A and B can be expected to be 

about _1_ X 1= 1 or, for a file of 100,000 

100 20 J 70,000 

references, atioit ^ records. 




Documents indexed by A and B and C can be expected 



to be about 1 



1 



1 



1 



or. 



100 200 20,000 itOO,000,000 
for a file of 100,000 references, about .0002 records, 



NOTE: Even if A and B were very likely to appear together and the number of records 

having both of them were 100 instead of 5, the probability of A and B and C all 
appearing together is still remote: 

PROBABILITY OF 100 = 1 P^OBAB I L I TY OF I X 1 = I 

A AND B: 100,000 1,000 A AND B AND C: 1,000 20,000 20,000,000 

or, for a file of 100,000 references, about .005 records. 



ERIC 



PIGURE B 



COMMON DESCRIPTOR .'ELECTION PROBLEMS (EXAHPLES) 



PAPER #15 



1 . No Descriptor Representing Concept 

DIFFERENTIAL PSYCHOLOGY PARAPSYCHOLOGY 
INSTISCTS SLEEP TEACHING 

LEARNING CENTERS SP0RT3 (SPECIFIC SPORTS) 

2. Descriptor Best Found Via Other Displays (e.g.. Rotated) 

EIGHTEENTH CENTURY LITERATURE 

TWENTIETH CENTURY LITERATURE 

NOTE : No entries in alphabetic display under CENTURY 

VISUALLY HANDICAPPED 

NOTE : Cannot be found under HANDICAPPED in alphabetic display 
because treated as NT to PERCEPTUALLY HANDICAPPED 

LOCAL HOUS I NG AUTHORITIES 

MALAYO POLYNESIAN LANGUAGES 

STUDENT SCIENCE INTERESTS 

3. Lov< pQjted Descriptors (Mu'.t Not Intersect) 

HORSES (2) 

METALLURGICAL TECHNICIANS (I) 
PARKING METERS (I) 
SEISMOLOGY (2) 

k. Descriptor Too Specialized (?) 

ANISEIKONIA 
AUDITORY AGNOSIA 
CORN 

HAGIOGRAPHIES 
HETEROPHORIA 

HIGH INTEREST LOW VOCABULARY BOOKS 

HORIZONTAL TEXTS 

ONOMASTICS 

TAGMEMIC ANALYSIS 

TRANSFORMATION GENERATIVE GRAMMAR 

6. Descriptor Not Defined 

ART SONG INNER SPEECH (SUBVOCAL) 

ARTICULATION (PROGRAM) INPUT OUTPUT ANALYSIS 

CONCEPTUAL SCHEMES MILIEU THERAPY 

CONNECTED DISCOURSE NOMINALS 

COORDINATION COMPOUNDS NGN GRADED CLASSES 

DEEP STRUCTURE UNGRADED CLASSES 

DISTINCTIVE FEATURES SERVICE OCCUPATIONS 
INDIVIDUAL PSYCHOLOGY 



ERIC 



7. 



Descriptor Very Broad (Must be Intersecte d) 



ABILITY 


METHODS 


ATTITUDES 


NEEDS 


BACKGROUND 


OBJECTIVES 


BEHAVIOr. 


PERFORMANCE 


DATA 


PLANNING 


DEVELOPMENT 


PROBLEMS 


EDUCATION 


PROGRAMS 


ENVIRONMENT 


RESEARCH 


EVALUATION 


SCIENCES 


GROUPS 


STUDY 


GUIDES 


TEACHING 


INSTRUCTION 


TECHNIQUES 


LEARNING 


THEORIES 



8. Descriptors So Close in Meaning That They Must Be Used Together 
(Ne.ir Synonyms) 



HEREDITY 
GENETICS 



JNONFARM YOUTH 
I^YOUTH 

jeducational television 
/instructional television 
^televised instruction 



INFORMATION 
INFORMATION 
INFORMATION 
INFORMATION 
INFORMATION 
INFORMATION 
etc 



RETRIEVAL 

SEEKING 

NEEDS 

PROCESSING 

SERVICES 

DISSEMINATION 



C JNSELING CENTcRS 
GUIDANCE CENTERS 
^ etc. 



EDUCATIONAL 
EDUCATIONAL 



COUNSELING 
GUIDANCE 



etc. 



J EDUCATIONAL NEEDS 

\ EDUCATIONAL OBJECTIVES 

[evaluation METHODS 
^EVALUATION TECHNIQUES 

PRE-SCHOOL PROGRAMS 
PRE-SCHOOL EDUCATION 
IPRE-SCHOOL CURRICULUM 



ERIC 



I 



IV PRACTICE SESSION IN STRUCTURING SEARCHES 



ERIC 



SEARCH #1 
TEXT EDITING BY CATHODE RAY TUBE 




CATHODE RAY TUBE 
FAMILY OF TERMS 



ERIC Thesaurus 



CATHARSIS 060 
8H RELAXATION OF ZMOTIOHAL TENSION BY 

EXPRESSIVE REACTION 
UF A8RCACTION 

PSyCHtKTATHARSIS 
BT fMmtCHkL BXPEFI»CE 
RT AGGRESSION 

ANXIETY 

OOTZONAL DEVELOPMENT 
HOSTILITY 

PSYCHOLOGICAL PATTEFNS 
PSYCHOTHERAPY 
REACTIVE BEHAVIOR 
SELF EXPRESSION 

CATHOLIC EOUCAtORS 
BT TEACHERS 

CATHOUCS; 
CATHOLIC >OLS 



ERIC Thesaurus Entries 
for Terms Suggested By 
R IE Indexing 



COMPUTER ORIEffTED PNOORAMB 
^ W l!?..^f^?^I'ON OP COHPtrtER 



270 



No Descriptor 



RT 



CMURC^ 
CW7PC 
NUH T 
RELIO. 



COLLEQES 



.ATION 



Identifier Usage Report 



C«tell Intent Sc«l« 
C«^tio<l« Rjy tuM 
CATHOLIC CHURCH 



ED047>0j )ED0«t7^J<t £pOU002 
EDOtflbb 



f^n Accession Indexed by 
CATHODE RAY TUBE 



EM 008 705 



ED 047 504 ^ 

Ttutmax, Da\ rd B 

Two Applkatlem ol SImutotloii In tht EdueailMial 
Environment Tech Memo. 

Florida State Univ., Tiilbhussee. Computer- 

Askikted Jn»iruciion Ccnicr. 
Sponk Agcncy^OlTice of Naval Research. 

W;i»hin|ton. DC. Per>»*nnel and Triiininj 

Rc^ejrch Programs Otficc 
Report No - AD- 7 1 «. 847; TM-31 
Pub Date 71 

Note — 27p,, Paper prctcMeii ui ihe Annual M 
trg of the Amer»c;ift f Jut ation^il Research 
%ociaiion (Nc\* York. N.Y.. February /4-7 
IV7h 

A\;i lible from- National Technical Infojination 
Service, SpringneJd, Vir|iinia 22151 1^0-718 
K47. MF $ V5. HC $3 001 

Document Noi Aviilabk Trrn EDKS 



Dc >cripior%^ftoh>vio?ftl 
Lomouttr OrkA tod 



Or, 
EilucatH^ntS 



Scitnc« 



ERIC 



trangXTtmcs, Hypothnii Tcitinij gnput Outpi5> 
3cvKci> Inieraciion Maihcmttlcai ModcItT 
ffcmatfct Instruction. ^Simulation, Sutisti- 
cal Analyiii, Typewriting 
Identifiers-- A PL. A Progrtming LanguAg** 
Cathode Ray Tube. CRT. Ploridt State Univer- 
sity. IBM I $00 Initructional System. Statistical 
Simulation. STATSIM 

TwKb educational computer simuUtion« trf 
described in this papar. One of the simulations is 
STATSIM. • scries of tKcrciitf applicable to 
statistical instruction. T^f^ content of the other 
simulation is comprised of maihcmotical fMrning 
models. Student involvement, the interactive na* 
Cure of the simuls ni, and ttrt^mal displsy of 
matcrir*< are fc» 'ommon Hoth siir 
tions. learn* 'ions o c 




TBCHMOtOGY TO lOOCATIOH POR Rcrrtf 
INSTROCTIONM. MtQ BOSZNXSS 
APPLICATION 
BT CDM POTE R PROGRAMS 
RT COMPUTER ASSISTED ZHBTROCTXOH 
CQNPOTERS 
OONPUTER 8CIEMCB 
CXMPOTEK SCIEKCB EDUCATION 
EDUCATIONAL TECHMOLOGY 
KLBCTRONIC DATA PROCESSING < — 
W06RAMEC INSTROCTION 
TIME SBARItSS 



050 




DISPLAY SYSTEMS 
/) BT INFORMATION SYSTCNS 
RT COMPOTEFS 

ELECTRONIC DATA PROCESSING 
ELECTRONIC EQOZPMEm 
INTORNATION PROCESSING 
INPOT OOTPOT 

HAN HACUINE SY9fEM8 < 

SCREENS <OISPLAYS| 



BLRCTRONIC DATA PROCESSING 
8N DATA PROCeSSlMG BY MEANS OP 

COMPUTERS 
OF ADP 

AUTOMATIC DATA 9MKl9ZV!tQ 

EOP 

BT CONPUTEB SCIENCE 

DATA PROCESSING 
RT AUTOMAT lOM 

OOMPtrtER ORIENTED PROGRAMS 

COMPUTERS 

CONPOTER SCIENCE BOUCATIOB 
COMPUTER STORAGE DEVICES 
DATA BASES 

DATA PROCESSING OCCUPATIONS 
DISPLAY SYSTEMS 
INFORHATlOtt SYSTEMS 
INPtn OOTPUT DEVICES 
ON LINE SYSTEMS 
OPTICAL SCANNERS 
PROGRAMING 
PROGRAMING LANGUAGES 



INPUT OUTPUT DEVICES 
OF INPUT DEVICES 

OUTPUT DEVICES 
MT OPTICAL SCANNERS 
BT £QUX PMENT 

RT COMPUTER OUTPUT MICROFILM 
COMPUTERS 

OOMPUTCR STORAGE DEVICES 
ELECTRONIC DATA PROCESSING 
BLECTRCNIC EQOIPMWT 
FACSXMZLE TRAN8II8810M 
IRF0RNAT20N PROCESSING 
INPUT OUTPUT 
MAGNETIC TAPES 
ON LINE SYSTEMS 
TELECONHUniCATIOH 



HAN HACmwE SY9TDIS . OSO 

SN "EN AND MACHINES INTERACTING TO 
FORM SINGLE SYSTEMS 

Ur MAN MACHINE COMMUNICATION 

MAN MACHINE INTERACTION 

MAN MACHINE INTERFACE 
RT ADVANCED SYSTEMS 

AUTOMATION 

BIONICS 

COMPUTER ASSISTED INSTRUCTION 
Cr&ERNETICS 

DIAL ACCESS INFORMATION SYSTEMS 

DISPLAY SYSTEMS 

FEEDBACK 

BUHAN BHQIMEERIH6 
INTERACTION 
HANAGEMEirr 8^^>rWB 
ON LINE SYtoTEML 



170 



The terms selected for our first group, Cathode Ray Tube, are therefore: 



C losely Related ; 



CATHODE RAY TUBE 
CRT 

DISPLAY SYSTEMS 
INPUT OUTPUT DEVICES 
ON LINE SYSTEMS 
MAN MACHINE SYSTEMS 
SCREENS (DISPLAYS) 

Broader Terms (could be dropped If output too high) 

COMPUTER ORIENTED PROGRAMS 
COMPUTER SCIENCE 
COMPUTERS 

ELECTRONIC DATA PROCESSING ■ 



Let's take Text Editing now. 



ERIC 



ECONOMICS 
CONSUMER ECONOMICS 

HOME ECONOMICS EDUCATION 
EDUCATIO^AL ECONOMICS 
HOME ECONOMICS 
LABOR ECONOMICS 
OCCUPATIONAL HOME ECONOMICS 
RURAL ECONOMICS 
HOME ECONOMICS SKILLS 
HOME ECONOMICS TEACHERS 
EDITING 
EDITORIALS 

EDUCABLE MENTALLY HANDICAPPED 
EDUCATION 



TEXTBOOK ASSIONMENTS 
TEXTBOOK BIAS 
TEXTBOOK CCNTENI 
TEXTBOOK EVALUATION 
TEXTBOOK PREPARATION 
TEXTBOOK PUBLICATIONS 
TEXTBOOK RESEARCH 
TEXTBOOK SELECTION 
TEXTBOOK STANDARDS 
TEXTBOOKS 
HISTORY TEXTBOOKS 
MULTICULTURAL TEXTBOOKS 
SUPPLEMENTARY TEXTBOOKS 

TEXTILES INSTRUCTION 
HORIZONTAL TEXTS 
PROGRAMED TEXTS 
VERTICAL TEXTS 

TEXTUAL CRITICISM 
THAI 

THEATER ARTS 

ERIC Thesaurus - Alphabetic Display 



TEXT EDITING 



FAMILY OF TERMS 



ERIC Thesaurus - 

Rotated Descriptor Display 



Terms Suggested by Cross-Ref erenc- 



conreNT analysis i$o 

BT EVALUATION METHODS 
RT COHMONI CATION (TH006HT TRAKiFER) 
COURSE COHTENT 
CRITICAL READING 
DATA ANALYSIS 
XOITINQ 
ITEM ANALYSIS 
LITERARY ANALYSIS 
LITERARY CItlTICISH 
LITERATURE REVIEWS 
TEXTBOOK COlTTENr 



UF 

BT 
RT 



110 



EDITING /DSO 
SN TO HAKE SUITABLE FOR PUBLICATION OK 
F0<< PUBLIC FREBOrrATION BY 
SELECTING* EMENDING, REVISING, AND 
COMPILING > 
COPYBDITING / 
EVALUATION NrfBODS ^ 
CONTEMT ANALYSIS 
FILMS 

JOURNALISM 
LANGUAGE ARTS 
LANGUAGE STYLES 
HEWS MEDIA 
PUBLICATIONS 



TEXTUAL CRITICISM 
BT LITERARY CRITICISM 
RT ANALYTICAL CRITICISM 
CRRONICLKS 
FORMAL CRITICISM 
HISTORICAL CRlTXCISti 
ITALIAN LITERATURE 
LITERARY ANALYSIS 
UTERARY COKVEirriONS 
LITERARY GENRES 
LITERATURE 



264 




CiKiibined Descriptor/ Identifier List 



I EDITH OnCEN 
1 COITINO 

1 EOITINO AS A MAY OF LIPC 
3 COITINO PWCCOUKS 

r» EDITORIALS 

I COL ICAOINO VCRSATILITY TCSTft 



1 

3 

I 

I 

10 



TEXAS. UNIVCRSITY OF T^XAS AT AUST 
TCXT BOCKS 

TEXT HANOL tNG SYSTEMS 
TEXT PAC SYSTEM 
TEXT PROCESSING 
TEXT SEARCHING 
TEXTBOOK ASSIGNMENTS 



UTERARY ANALYSIS 
NT i^LITERARY OISCRWINATIOH 
BT EVALUATION HETBODB 

LITERATURE 
RT ANALYTICAL CRITXCISW 

CHARACTERIZATION (UTERATUtU 
CONEDY 

COMPOSITION ILITERARY) 
OONTENT ANALYSIS 
CRITICAL REAOINS 
DRAN^ 
EPICS 

FirrecrTR century uterature 

FICO <AtXVE LANGUAGE 
PORK\L CRITICISM 
FRENCH LITERATURE 
GERMAN LITERATURE 
HISTORICAL CRITICISM 
LITERARY CONVENTIONS 
UTERARY CRITICISM 
LITERARY GENRES 
UTERARY STYLES 
LOCAL COLOR WRITING 
MEDIEVAL FOKMTCE 



LITERARY CRITICISM 260 
m ANALYTICAL CRniCISM 
ARISTOTELIAN CRITICISM 
FORMAL CRITICISM 
HISTORICAL CRITICISM 
IMPRESSIONISTIC CRITICISM 
LITERARY STYLES 
MORAL CRITICISM 
MVTNIC CRITICISM 
PLATONIC CRITICISM 
RHETORICAL CRITICISM 
TEXTUAL CRXTICISM 
THI90RETICAL CRITICISM 
RT CHARACTERIZATION (UTERATUREI 
CONTENT ANALYSIS 
CRITICAL READING 
DIALOGUE 

DRAMATIC UNITIES 

EIGHTEENTH CENTURY LITERATURE 

ENGLISH NEOCLASSIC LITERARY PERIOD 

EXISTENTIALISM 

FIFTEENTH CENTURY UTERATURE 



96 TEXTBOOK BUS 



ERLC 



The terms selected for our second group, Text Editing, are 
therefore: 



Closely Related : 
EDITING 

EDITING PROCEDURES 
TEXT HANDLING SYSTEMS 
TEXT PROCESSING 
TEXT SEARCHING 



More distantly related, but might have been used by indexers in 
absence of specific term: 

CONTENT ANALYSIS 
LITERARY ANALYSIS 
LITERARY CRITICISM 
TEXTUAL CRITICISM 



ERIC -H> 



The fina) search statement can be structured in several 



wayb ; 



1 . Simple Intersection of fwo Groups 




CRT 



AND 



Content Analysis 
or 

Ed J ting 

or 

Editing Procedures 
or 

Literary Analysis 
or 

Literary Criticism 
or 

Text Handling Systems 
or 

Text Processing 
or 

Text Searching 
or 

Textual Criticism 



or 

Man Machine Systems 
or 

Screens (D i sp) ays) 



2. Absolute retrieval of documents indexed by highly specific, but low 
posted terms, with intersection of remaining terms in each group. 

Cathode Ray Tude OR CRT OR Editing OR Text Handling .ystems OR Text 
Processing OR Text Searching 



Screens (Displays) 



In addition, the search involving the terms regarded as broader or peripheral 
could be handled separately In order to siphon off nx>st of the "false drops'* and 
Q / relevance material. 



Computer Oriented Programs 
or 

Computer Science 
or 

Computers 

or 

Display Systems 
or 

Electronic Data Processing 



AND 



Content Analysis 
or 

Literary Analysis 
or 

Literary Criticism 
or 

Textual Criticism 



or 

Input Output Devices 
or 

On Line Systems 
Man Machine Systems 



or 



ERIC 



SEARCH #2 



The use of audiovisual materials, instructional media, and innovativ-^ 
teaching techniques in teaching social studies to elementary level minority 
group children in urban school systems* Not interested in rural or small 
schools, or anything written before \3(iB. 

This inquiry, on first inspection, seems to involve five major groups, 
a NOT function, and a date limitation, as shown below: 




As the search t^trategy is considered, the searcher observes that it is 
bettei to NOT out the unwanted academic levels rather than to include "Elementary" 
level Lerms in an AND function, because academic levels are not always assigned 
by the ndexers. However, including "Urban" terms in an AND group will effectively 
eliminat' "Rural Sriiools", so a NOT function is not necessary to handle that 
part icul ar restr i ct ion . 

An effective way into the indexing vocabulary is to use the Rotated Descriptor 
Display for the following terms which appear in the inquiry: 

Social Stud ies 

Aud iovi sual 

Med i a 

I nnovat ion 

Minori ty 

Urban 

High Schools 



The following groups begin to take shape: 
FIRST GROUP SECOND GROUP 



AND 



Social Studies 
or 

Social Studies Units 



AND 



Aud iovi sual Aids 
or 

Aud iovi sual I nstruct ion 
or 

Aud iovi sual P rograms 
or 

I nstruct iona I Med i a 
or 

I nstruct ional I nnovat ion 



THIRD GROUP 

Minority Group Children 
or 

Minority Groups 



AND 



FOURTH GROUP 



Urban School s 



BUT NOT 



FIFTH GROUP 



High Schools 




or 




High School 


Cur r icul urn 


or 




High School 


Students 


or 




Junior High 


School s 


or 




Junior High 


School Students 



ERIC 



These terms can in turn be looked up in the Thesaurus, or the other 
tools, to add the following terms to the groups as follows: 



SECOND GROUP 



THIRD GROUP 



FOURTH GROUP 



AND 



AND 



Mult imed ia Instruction 



Ethnic Groups 
or 

Negroes 
or 

Negro Students 



Urban Areas 
or 

Urban Education 
or 

Urben Teaching 



FIFTH GROUP 



BUT NOT 



Secondary Schools 



Check the postings to each term lO get an idea of how many entries we 
are working with (e.g., URBAN SCHOOLS has over 800 terms, but since so many 
groups are being intersected, more "Urban" terms are picked up). 

Limit the first group, "Social Studies", to major usage only since that 
is the requestor's prime concern, and since most teachers are not interested 
in wading through a lot of material. 

The stipulation for no material written before 1968 must be handled 
according to the system capab i 1 i t ies---by ED number, by R I E issue, by 
publication date, or whatever your particular software allows. 



ERIC 



-51- 



The final logic and Venn diagram might be as shown below. Though 
the number of total terms is fairly large (23), do not be deceived; this 
is a very tight search {k intersections) and could not result Jn a large 
output . 



FIRST GROUP 

Social Studies 
or 

Social Studies Instruction 



SECOND GROUP 



AND 



AND 



Aud iovi sual Aids 
or 

Aud i ovi sua I I ns t r uct ion 
or 

Aud iovi sual Programs 
or 

I ns t ruct, lonal I n no vat ion 
or 

I ns t ruct ional Med i a 
or 

Mu 1 1 imed i a I ns t ruct i on 



THIRD GROUP 

Minority Group Children 
or 

M i nor i ty Groups 
or 

Ethnic Groups 
or 

Negro ^students 
or 

Negroes 



AND 



FOURTH GROUP 

Urban Areas 
or 

Urban Education 
or 

Urban Schools 
or 

Urban Teaching 



BUT NOT 



FIFTH GROUP 

High School Curriculum 
or 

High School Students 
or 

High Schools 
or 

Junior High School Students 
or 

Junior High Schools 
or 

Secondary Schools 



ERIC 




ERIC 



SEARCH #3 



Research on the use of criterion reference testing in compensatory 
education programs, spec^^^^ca^ly ones funded under ESEA Title I. User thought 
California Test Bureau had done some work in this area. Elementary level; 
reading especially, but will accept other disciplines. 



This is a three group search. 




Let's suppose the searcher handling this request is not familiar with 
criterion tests. (If a telephone or interview request were Involved, the 
searcher would have gotten as much Information as possible from the requestor. 
If a letter of other remote me jins were involved, the searcher might well be 
on his own with the system.) 



ERLC 



THESAURUS 



ALPHAPhTjC DISPLAY 



THESAURUS - GROUP DISPLAY 



CRITCRION MEASURES 
USE CRITERION REFEREKCED TESTS 

CRITERION REFERENCED TCST& S20 
SN ANY TEST CESlGMtJ AND CONSTRUCTED 

ACCORDING TO EXPUCIT RULES LINKING 

AN iNOIVlDUAiS PEPPORMANCE TO 

BEHAVIORAL REFERENTS 
UF CRITERION MEASURES 

CRITERION TESTS 
tT TESTS 

RT ACHIEVEMENT TESTS 
DIAGNOSTIC TESTS 
GROUP TESTS 
ITEM BANKS 

HEASDREMENT TECHKIOUES 
NORM PEFEHENCED TESTS 
OBJECTIVE TESTS 
PROGNOSTIC TESTS 
STANDARDIZED TESTS 
TEST CONSTRUCTION 

CRITERION TESTS 
USE CRITERION REFERENCED TESTS 



THESAURUS - KIER A RCHfCAL DISPLAY 



TESTS 

. ACHI EVErtFNT TESTS 

..COLLEGE ENTRANCE EXAHINATIONS 

. .BQUIVALEIICY TESTS 

..LANGbAGC TEStS 

..HATIONAL COHPETENCY TESTS 

..PERFORMANCE TESTS 

..READING TESTS 

...INFORMAL READING INVEMTORY 

...READING READINESS TESTS 

..SCIENCE TESTS 

.APTITUDE TESTS 

..INTEREST TESTS 

..OCCUPATIONAL TESTS 

•CLOZE PROCEDURE 

.CREATIVITY TESTS 

.CRITERION REFEPENCEO TESTS 
.CULTUBE FREE TESTS 
.DIAGNOSTIC TESTS 
.ESSAY TESTS 
•GROUP TESTS 

. .GROUP IWrttLlGEWCE TESTS 

..SCREENING TESTS 

.INDIVIDUAL TrSTS 

.LISTLSIMG TESTS 

.N0NVEF3AL TESTS 

.NOKH REFERENCED TESTS 

.OBJECTIVE TESTS 

..MULTIPLE CHOICE TESTS 

.PERCEPTION TESTS 

.PHYSICAL EXAMINATIONS 

..AUDITORY VISUAL TESTS 

.. .AUDITORY TESTS 

. . ..AUDIOHETRIC TESTS 
VISION TESTS 

..SPEECH TESTS 

.PRESCNOOL TESTS 

.PRETESTS 
.PROBLEM SETS 
.PROGNOSTIC TESTS 
. PSYCHOLOGICAL TESTS 
. .ABSTRACTION TESTS 
..ASSOCIATION TESTS 
..COGNITIVE TESTS 
. .INTELLIGENCI TESTS 
...GROUP IWrELLIOKHCE TESTS 
« .MENTAL TESTS 
..PERSONALITV TESTS 
...AFPCCTIVE TESTS 
...ATTITUDE TESTS 
...IDENTIFICATION TESTS 
...INTEREST TESTS 
...MATURITY TESTS 
...PROJBCTIVE TESTS 
. . .SELF CONCEPT TT ITS 
..SITUATIONAL TES'i 
.SCHOOL READINESS ll^TS 
.STANDARDIZED TESTS 
.TACTUAL VISUAL TESTS 
.TIMED TESTS 
.VERBAL TESTS 
Q .VISUAL MEASURES 

ERIC 



520 Tests 

Devices or procedures for measuring ability, achieve- 
ment. Interest, etc., e.g.. Achievement Tests, Aptitude 
Tests, Cognitive Tests, Interest Tests, Language Tests, 
Mt "^hoice Tests, Problem Tests, Reading Tests, 
T< tificatlon. Test Validity, etc. 



520 TesK 

ABSTRACTION TESTS 
ACHIEVEMENT TESTS 
AFFECTIVE TESTS 
ANSWER KEYS 
APTITU DE TESTS 
ASSOCIATION TESTS 
ATTITUDE TESTS 
AUDI0METR1C TESTS 
AUDITORY TESTS 
AUDITORY VISUAL TESTS 
BIOGRAPHICAL INVENTORIES 
CLOZE PROCEDURE 
COGNITIVE TESTS 

COLLEGE ENTRANCE EXAMINATIONS 

CREATIVITY TESTS 

CRITERION REFERENCED TESTS 

CULTURE FPEE TESTS 

DIAGNOSTIC TESTS 

EQUIVALENCY TESTS 

ESSAY TESTS 

GROUP INTELLIGENCE TESTS 
GROUP TESTS 
IDENTIFICATION TESTS 
INDIVIDUAL TESTS 
INFORMAL READING INVENTORY 
INTELLIGENCE TESTS 
INTEREST TESTS 
LANGUAGE TESTS 
LISTENING TESTS 
MATURITY r£STS 
MENTAL TESTS 
MULTIPLE CHOICE TESTS 
NATIONAL COMPETENCY TESTS 
NONVERBAL TESTS 
NORM REFEi^ENCED TESTS 
OBJECTIVE TESTS 
OCCUPATIONAL TESTS 
PERCEPTJCr,] TESTS 
PERFORM;iNCE TESTS 
PERSONALITY TESTS 
PRESCHOC)L TESTS 
PRETESTS 
PROBLEM SETS 
PROGNOSTIC TESTS 
PROJECTIVE TESTS 
PSYCHOLOGICAL TESTS 
PJZ2LES 

RSADfNG READINESS TESTS 
READING TESTS 
SC^toOL READINESS TESTS 
SCIENCE TESTS 
SCREENING TESTS 
SELF CONCEPT TESTS 
SITUATIONAL TESTS 
SPEECH TESTS 
STANDARDIZED TESTS 
TACTUAL VISUAL TESTS 
TALENT IDENTIFICATION 
TEST CONSTRUCTION 
TEST SELECTION 
TEST VALIDITY 
TESTING PROBLEMS 
TESTING PROGRAMS 
TESTS 

TIMED TESTS 
VERBAL TESTS 
VISION TESTS 

-55- 



FR- M : ERIC CLEAR I NGHOUSE SCOPE OF INTEhEST MANUAL 



TESTS. MEASUREMENT, AND EVALUATION (TM) 



ERIC Clearinghouse on Tests, Measurement, and Evaluation 
Educational Testing Service 
Rosedale Road 

Princeton, New Jersey 085^0 
(609) 92U9000 X269I 



ABSTRACT OF SCOPE ; 

Tests, scales, inventories, or other measurement devices or instruments; 
test development and construction; critical review of tests; measurement 
and evaluation procedures and techniques; applications and procedures of 
measurement or evaluation in educational projects or programs; comparative 
analysis of specific testing techniques. 

APPLICABLE PHRASES AND TERMS (ALPHABETICALLY ARRANGED) ; 

Aptitude Tests 
Attitude Tests 
Evaluation Procedures 
Evaluation Techniques 
Inventories 

Measurement Procedures 
Measurement Techniques 
Scales 
Tests 

NOTE ; . ' ^ 

Documents concerned primarily with the procedures and techniques used in 

a project to evaluate, measure, or test certain variables (whatever the 

content, population, or level of the study itself may be) should be directed 

to ERIC/TM. If however, the Interest is mainly on the subject matter, 

and evaluation plays only an incidental role, the document should be forwarded 

to the appropriate subject-oriented Clearinghouse. 



ADDRESS: 



TELEPHONE: 



ERIC 



FROM : EKIC CLEARINGHOUSE SCOPE OF lirtT.R LST rrAMUAL 

ERIC THESAURUS DESCRIPTORS 
MOST COMMONLY USED IN INDEXING DOCUMENTS ON TESTS, MEASUREMENT, AND EVALUATION 



Academic Achievement 

Academic Performance 

Achievement Needs 

Attitude Tests 

Behavioral Objectives 

Classroom Observation Techniques 

Col lege Students 

Correl at! on 

Decision Making 

Educational Improvement 

Educational Objectives 

Educat ional Research 

Evaluation 

Evaluation Criteria 

Evaluation Methods j^;.,^ 

Evaluation Techniques Ife- 

Factor Analysis 

Item Analysis 

Mathematical Models 

Measurement 

Measurement Instruments 
Measurement Techniques 
Models 

Predictive Ability (Testing) 
Predictive Measurement 
Predictor Variables 
Program Effectiveness 
Program Evaluation 
(Questionnaires 
Rating Scales 

Research Methodology ^ , 

Statistical Analysis 

Student Attitudes 

Student Evaluation 

Test Construction 

Test Interpretation 

Test Reliability 

Test Validity 

Testing 

Tests 



r 



with the he)p of the Thesaurus scope note, and any information provided 
by the requestor, we might decide to use the following descriptors: 

Criterion Referenced Tests 
Diagnostic Tests 
Prognostic Tests 

Check the postings to see what quantities; we're working with. Also 
check the Identifier Usage Report for possible additional terms. The 
following identifiers might be added to comple':e the first group: 

Criterion Referenced Measurement 
Criterion Tests 
Dignostic Reading Program 
Diagnostic Reading Tests 
Diagnostic Reading Tests 
Diagnostic Tests (Education) 
Prognostic Tests (Education) 

The second group of tei^ms can largely be located in the Rotated and 
Alphabetic Displays of the Thesaurus . and in the Identifer Usage Report . 

Cal i fornia Test Bureau 

Compensatory Education 

Compensatory Education Progriam^s 

Elementary Secondary Education Act Title I 

Elementary Secondary Education Act Title I Program 

ESEA Title 1 

ESEA Title I 

ESEA Title I Programs 

Reading Programs 

Remed i al I nstruct ion 

Remedial Programs 

Remedial Reading 

Kemedial Reading Programs 

Because the postings in the main group. Criterion Referenced Tests, 
are not very high, it is again best to NOT out the unwanted levels, or 
ignore the levels entirely, rather than use elementary level terms in an 
AND function. This forms the third group. 



The fina) search statement might look like this: 



FIRST GROUP 

Criterion Referenced Measurement 
or 

Criterion Referenced Tests 
or 

Criterion Tests 
or 

Diagnostic Reading Program 
or 

Diagnostic Reading Test 
or 

Diagnostic Reading Tests 
or 

Dianostic Tests 
or 

Diagnostic Tests (Education 
or 

Prognostic Tests 
or 

Prognostic Tests (Education) 



AND 



SECOND GROUP 

California Test Bjreau 
or 

Compensatory Education 
or 

Compensatory Education Program 
or 

Elementary Secondary Education Act 
Title I 

or 

Elementary Secondary Education Act 
Title I P 'ogram 



or 



ESEA Title 1 



or 



ESEA Title I 

or 

ESEA Tit*le I Programs 
or 

Reading Programs 
or 

Remed i al I nst ruct ion 
or 

Remedial Programs 
or 

Remedial Reading 
or 

Remedial Reading Programs 



BUT NOT 



THIRD GROUP 

High School Curriculum 
High School Students 
High Schools 

Junior High School Students 
Junior High School s 



ERIC 



If the main concern were judged to be the use of criterion referenced 
tests in compensatory education, with the elementafy level restriction not 
a strong one, a simple two level search should be run, dropping the negated 
third group altogether, particularly in light of the relatively low postings 
to the terms in Group I. 

However, the user definitely wants on) y elementary level material, a 
three-way intersection could be used, substituting a postive "hird group of 
"elementary^' terms for the negated non-elementary term group. 

NEW THIRD GROUP TO BE ANDed 
WITH FIRST TWO GROUPS 



Elementary Education 





or 


Elementary Grades 




or 


Elementary School Curriculum 




or 


Elementary Schools 




or 


Elementary School Students 




or 


Grade 1 


or 


Grade 2 


or 


Grade 3 


or 


Grade k 


or 


Grade 3 


or 


Grade 6 






-6^ 



The following example represents an attempt to put too many restrictions 
on this search. If anything emerged, \t would be the "perfect hit*', but it 
Is much more likely to result in no hits. 



AND 



AND 



FIRST GROUP 

Criterion Referenced Measurement 
or 

Criterion Referenced Tests 
or 

Criterion T^fts 
or 

Diagnostic Feeding Program 
or 

Diagnobtic Reading Test 
or 

Diagnostic Reading Tests 
or 

ri agnostic Reading Tests 
or 

Diagnostic Tests 
or 

Diagnosi.c Tests (Education) 
or 

P -ognost ic Tests 
or 

P 'ognostic Tests (Education) 

SECOND GROUP 

Compensatory Education 
or 

Compensatory Education Programs 
THIRD GROUP 
ESEA Title 1 



AND 



ESEA Title I 



or 



or 



ESEA Title I Programs 
or 

Elementary Secondary Education Act 
Title I 

or 

Elementary Secondary Education Act 
Title I Program 



AND 



FOURTH G^OUP 

Reading Programs 
or 

Remed iai Read i ng 
or 

Remedial Reading Programs 
or 

FIFTH GROUP 

Elementary Education 
or 

Elementary Grades 
or 

Elementary School Curriculum 
or 

r.Iementary School Students 
or 

Elementary Schools 
or 

Grade 1 

or 

Grade 2 

or 

Grade 3 

or 

Grade k 

or 

Grade 5 

or 

Grade 6 




-61- 



This search obviously has several possible ways it can be handled. The 
search negotiation process would hopeful 'y help the searcher determine the 
best strategy to fit the user's needs. ^Iso of importance, however, would be 
the emperical results achieved. The mul t i -level intersections, while 
sophisticated, may be too limiting to be practical. If early attempts fail 
to yield any (or sufficient) hits, the searcSer could easily be forced back 
on a "coarser sieve" (using fewer intersections and based on the basic terms 
in Group 1) in order to find more material for the user to peruse. 



IV. OUTPUT PHASE 



ERIC 



-63- 



PAPER tf](> 



NO HITS - WHAT TO DO? 



Let I'S assume that a search has been made and that there has been no 
machine failure involved. in other words, the search was submitted ano 
processed by the computer, but no file records were found to meet the 
stated search specifications. What should be done next? 

The folJowing is one checklist that might be followed; 

1 . Was There an Error in Coding? 

There may be something technically, wrong with the way the question 
was asked. This is sometimes referred to as an error in "syntax", the 
syntax refer;"ed to being the particular conventions of the search 
program as to how queries should be translated into symbols, i.e., 
coded. For example, the number of parentheses may incorrectly be 
odd instead of correctly even. What this amounts to is that the 
searcher didn't really ask the question he wanted to ask. 

2. Were There Typographical Errors? 

Are all the index terms involved in the search spelled correctly? 
Unless the searcher has the advantage of a system that validates 
search terms against the Thesaurus , he may unknowingly have specified 
fOUCATON rathe- than EDUCATION. The computer searches for exactly 
what W'3S asked for and, of course, finds nothing. In this sane category 
are problems involving incorrect use of blanks. A blank space is a 
character like .iny other as far as the computer is concerned. If you 
pi*: two blanks (or no blanks) between the words of a muJti-worci 
Descriptor, than the computer, being totally literal, looks for exactly 
that. As far as garnering hits is concerned, you might as well 
have written the Descriptor backwards. 

3. Was the Logic Too Restrictive? 

It doesn't take much experience in searching before you discover 
that intersections (AND logic) drastically cuts down on the number of 
hits. In most systems A AND B is a format often utilized; A AND B AND C 
will rarely resulu in a large number of hits; A AND B AND C AND D 
will almost alwayv; resu!t in no hits. (These generalizations depend, 
of course, on boti" the mr.iber of terms assigned on the average to 
each record, and cn the total records in the file, but in most 
bibliographic search systems they will hold true). Check your logic 
to see how many intersections are involved. 

k. Did You Check the Term Usage Statistics? 

Terms with very low usaves should generally not be intersected 
with other terms ai the probability of a hit will usually be low. It 
will usually be preferable to Simply ask for all the usages of an 
infrequently used term (OR logic). This approach does not create 
'Excessive hits and does not run the danger of no hits. 



5. Have You Selected Alternative Approached lo the Search Topic? 

Perhaps you have gone down only one trail (and that a dead-end) 
to get at what you want. There may be other trails, other Descriptors, 
quite close in meaning to the ones you selected, that the indexers have 
preferred to use in dealing with your topic. For example, you may have 
used SUMMER SCIENCE PROGRAMS, but neglected to also use SUMMER PROGRAMS 
intersected with various terms beginning with the word SCIENCE. 

6. How Have Documents Similar To The Kind You Are Seeking Been Indexed? 

Perhaps you haven't found the trail at all yet. One way to get 
started is to examine a known hit to see how it was indexed. You 
may pick up insights as to indexer approach that did not occur to 
you when contemplating the problem independently. 

7. Is There Likely To Be Material on This Topic in the Oata Base ? 

Perhaps the no hits situation is to be expected. The data base may 
simply not be likely to have material on the topic requested. For 
example, COfiPUTER MEMORIES, NATURAL CHILDBIRTH, etc., are not going to 
be well represented in the ERIC data base, if at all. 




PAPER fni 



RECALL AND RELEVANCE 



Recall and Relevance (Precision) are twin concepts that have been developed 
in order to attempt to measure and evaluate the quality of searches. 

R elevance . or Precision, as It has come to be called more and more, is a 
measure of whe- ;r the items received as output are relevant to the original 
inquir/. The decision as to whether an item is relevant can obviously be nade by 
sever<3l people: the searcher, the user, a panel of judges or experts, etc. 
When discussing relevance it Is essential to state who is making this decision. 

Recal 1 is a measure of how many of the relevant items in the file being 
searched were found. To what extent was the search comprehensive, did it 
exhaust the possibilities in the file? Was a lot of material left behind that 
the user would have wanted? 

In order to better explain these two measures, let us construct a hypothetical 
s i tuat ic5n; 



Size of Total File 100,000 items 

Number of References in File Which are 100 items 

Relevant to Inquiry A 

Number of References Retrieved by 80 items 

Actual Search 

Number of References Retrieved Which 60 items 

Are Judged to be Relevant to Inquiry A 



er!c 



Recall is defined as the follov;ing ratio: 



Number of Relevant References Retrieved = 60 = 60% 
Number of Relevant References in File 1 00'^ 

Relevance is defined as the following ratio: 

Number of Relevant References Retrieved = 60 = 75% 
Total Number of Reference Retrieved 100 



-6^ 



Many studies have shown that there is an inverse relationship between 
these two measures. In other words, in order to capture the remaining relevant 
references (that were missed the first time) it is necessary to "cast the net' 
so wide that a number of irrelevant references are also retrieved. Imagine 
the foHowing situatfon, for example: 



Number of References in File Which 
Are Relevant 



lOEAL RETRIEVAL ACTUAL RETRIEVAL 
100 100 



Number of References Retrieved 

Number of Retrieval References 
Which Are Judged Relevant 



100 

100 



^00 
100 



In this example, the Recall ratio has risen to 100%, but Relevance has 
dropped to )00 = 25%. 

Conversely, any attempt to push the Relevance ratio up tightens the 
search and inevitably sends the Recall ratio down. 

Experience suggests that a stable balance of about 65-60% Recall and 
65-80% Relevance is about the best that a system can achieve. Figure I depicts 
how these two measures relate to one another. 

It must be kept In mind that the negotiation with the user can often 
determine whether the searcher should strive for high Relevance or high Recall. 
In the former instance, the user loses the opportunity to make unexpectedly 
valuable "finds" among material which Is partly related to his topic. In 
the latter instance, the user Is being asked to accept (perhaps pay for) a 
heavy proportion of marginal material in order to cover his topic comprehensively. 

As a general ' le, however, it is advisable to err on the side of achieving 
Relevance, rather than the reverse. The reasons for this are: 

1. The user will not be expecting to use the computer as a browsing device. 
The usual stereotype of the computer will lead him to think of it as a 
fast and accurate method of receiving precisely the information "asked 
for". If the user receives a lot he did not "ask for"^ he will begin to 
question not so much the machine as the human operator doing the search. 

2. Someone Is, 0/ course, paying for any excess retrieval; If not the patron, 
then perhaps the reference center. 

In some situations the search strategy with respect to Relevance and 
Recall may be a matter of common sense policy based on factors other than the 
user. For example, in NASA's early days, when the file was small, and search 
reports were few, there was an editorial step In which output was examined 
and winnowed before transmittal. The policy during this period was to cast a 
wide net, i-.ev,' high Recall. Later, when the file had grown in size and the 
number of search requests was large, the editorial step was dispensed with 
for economic reasons and the policy was to aim for high Relevance and immediate 
unedited transmittal of output. 



ERIC 




o 
o 



0) 3 



L. 

0) 0) 
4-» O 
c 

C 

c 

— u 
^ a> 

>^ o « 

0) o 
> 

4) O 

0) 

O) 0) 

U 0) 

O ^ 

> L. 

0) 0) 

^ > 

< 0) 0) 



5^ 

o c 
o — 



4-» § 

L. 

0) 

u) 0) 
O 

o g 

4-* C 

0) D 

0) 

0) L. 

o 6 

c E 

OJ 

> 0) 

0) > 



■o o 

c 

OJ o 

1 1 



0) 0) 



0) 



< o 



L. 

JC o 



erJc 



PAPER fM8 



OUTPUT VOLUMES. WHAT IS TOO LITTLE? WHAT IS TOO MUCH? 



The answer to the question posed by this title will almost always depend 
on the user who asked the original question. \ have personally seen real 
life situations where: (1) the user was hoping for 0 hits in order to verity 
that no one else was working on the topic he hoped to enter; (2) the user 
wanted about 5,000 hits in order to prove that a large government program 
of several years duration had resulted in significant volumes of research 
reports and other documentation. 

It is very definitely a parameter that should be gotten from the user 

during the negotiation process. More often than not, if the searcher has 

a general Idea of the user's anticipated or desired volume, he can control 
to meet this volume goal. 

In the average search, however, it should be kept in mind that as the 
volume mounts it begins to approach a point where it will exceed the ability 
of the user to encompass it, to comprehend It, to moke good use of it, even 
to read the titles of each item output. This upper limit will vary somewhat 
for each user because each user has a different threi»hold, a different 
ability to handle large output volumes. My own experience would set this 
upper limit at around 200 hits. I try to stay upder 200 hits unless the 
user has specifically Indicated that a comprehensive search is desired. 

Sometimes, to avoid excessive output volumes, search systems will have 
a built in "hit lim't" restricting output to some arbiK-rary number, e,g., 
250, 400, 500, etc. The purpose of the "hit limit" is both to avoid 
inundating the user and also as insurance against a faultily constructed 
search that would otherwise "dump the file", "Hit limits" shouio r^nly be 
used when the output emerges in reverse chronological sort, i.e., latest 
first. Otherwise the items that are over the limit and therefore dropped 
would be the latest and most up-to-date material. It is preferable to 
exclude the oldest hits, not the newest hits. 

It is rare that a user will complain about too few hits, if they are 
genuinely relevant. The fewer hits there are, the less work the user has 
in reviewing them. For this he is perhaps unconsciously grateful. Low 
volume output can be a problv^in, however, if the hits are of marginal 
i nterest • 

The best solution to this problem is to get the maximum amount of 
information from the user as to his problem, hfs application, and the end 
use of the search output. If the user wants a few items of high relevance 
for immediate use, the searcher^s strategy and approach would be quite 
different than if the user wanted a comprehensive search of the file 
for the benefit of an extended state-of-the-art review. 



ERIC 



APPENDIX A 



ERIC VOCABULARY IMPROVEMEKTT PROGRAM 



I. INTRODUCTION 
A . Background 

Establishing and maintaining, with limited resources, an indexing vocabulary for 
a system which has a subject field as broad as that of ERIC and which, in addition, 
is decentralized on a subject basis, has presented a number of unique problems. At 
the inception of ERIC, for example, an effort such as Project Lex (then in progress) 
was out of the question because the expense was more than could be supported or 
justified. The decision was made to let the documents being indexed determine the 
vocabu 1 ary . " 

To avoid overloading the vocabulary with seldom used, highly specific terms (such 
as personal names, test names, geographic locations, etc . ), i ndexi ng was divided Into 
two types: Descriptors, which would be included in a controlled hierarchically 
structured vocabulary ( Thesaurus ) ; and Identifiers, which would be uncontrolled, and 
unstructured, but which would permit use of specific indexing for precise retrieval. 

The Descriptors which had been used for indexing the Disadvantaged Collection ^ 
in mid-1966 were chosen as a core vocabulary. upon which the ERIC Thesaurus could be 
built. A Descriptor Jus t i f icat icn Form (DJF) was designed to permit entry of new 
terms with possible synonyms (UP), broader terms (BT), narrower terms (NT), and 
related terms (RT). Provisions were also made for enterl.ng scope notes, a descriptor 
group identification code, and justification for the ter/n selection, including 
authority citations, A set of rules^- was published, and procedures for submitting 
candidate terms were establ ished.3 

Briefly, the procedures call for submittal of a DJF for a candidate term when 

and only whisn the term is required foi* indexing a document in hand. The DJF i'^ 

prepared by the indexer from one of the ERIC C 1 ear i nghoust^s , reviewed by Clearing- 
house supervision, and forwarded with a copy of the document input resume to the 
ERIC Processing and Reference Facility, which is the central swi tch i ng po 1 nt for 
the network. At the Facility, the DJF is reviewed and edited by a lexicographer 
for consistency, avoidance of proliferation, clarity, and confdrmance with rules 
and guidelines. Further review is imposed at the discretion of Central ERIC 
(National Institute of Education).^ 

On the whole, this procedure has been quite sucxessful. Thesaurus growth, which 
was quite rapid during the early years, has slowed markedly In the last several 
years, and is now relatively stable at around 5,000 main (postable) terms. However, 
the vocabulary is by no means perfect. With up to 20 different organizations scat- 
tered across the country indexing documents and submitting candidate terms, with 
the pressures of meeting publication deadlines working against extensive research 



1. Catalog of Selected Documents on the Disadvantaged Subject Index, ED O7O 485; 

Number & Author Index, ED O7O kQk. 

\ 

2. Rules for Thesaurus Preparation . 0E-120U7 (Superintendent of Documents, 
Washington, D.C., Sept. 1969). 

3. ERIC Processing Manual, Thesaurus Section (ERIC Processing & Reference Facility). 



Two 



and coordination, and with substantial turnover in some Clearinghouses, it was 
inevitable that some mistakes would occur and that some 1 ess-than-opt ima 1 decisions 
would be made. Over the years, a number of shortcomings in the vocabulary have 
developed: 

o Poor, incomplete, or invalid hierarchies; 

o Synonymy - Two or more terms which, for the purposes of ERIC fndexinq and 
retr ieva l , can be considered synonyms, e.g,, HEREDfTV and GENETICS; 

o Poor word choices, e.g., PUBLICIZE rather than PUBLICITY; 

o Misspellings, e.g., PARODOX for PARADOX; 

o Ambiguity, e.g., prior to the introduction of PROGRAMING (BROADCAST) in • 
1971 » the term PROGRAMING had been applied to both computer programing 
and broadcast programming; 

o Low postings, e.g,, HORIZONTAL TEXTS and VERTICAL TEXTS with one posting 
each from the 1966 Disadvantaged Collection; 

o Scattering in the Identifier file, e-_-g., 17 variations in entries for Title 
111 of the Elementary and Secondary Education Act. 

Unfortunately, cor rect i nn mos t of these shortcomings is not accomplished simply, 
par t icu^lar I y when they have had time to ^'set". In the case of hierarchical defects, 
making a change is mechanically relatively easy, since'only one or two DJF's are 
usually required. The problem arises in making sure that the change is in fact a 

correccson i.e., that the new structure is better than the old, and that there are 

no unwanted side effects. On the other hand, the other deficiencies are intelSec- 

tuaMy rather simple you pick the preferred term and eliminate the non-pref erred 

Cr-ie{s). However, the implementation mechanics are complex and cumbersome. The ERIC 
s:)ftware, which was designed to insure synchronization between the Resume Master Data 
Set (linear file) and the Satellite Master Dota Sets (inverted files), will not permit 
the deletion of a term from the Thesaurus so long as there are documents posted to 
(indexed by) that term. Until recently, in order to delete a Thesaurus term, it was 
necessary to prepare a separate transaction for each document indexed by that term 
to delete the term from the Resume Master Data Set, and it would then be deleted" 
from the Satellite Master Data Set by the system. Ai: the same time, if you wanted 
to avoid an intolerable loss of information, a second set of transactions had to be 
prepared, replacing the deleted term with the preferred one. Since the median posting 
density of Thesaurus terms is about 50 documents per term, about 100 transactions 
would typically be required to accomplish each change. Obviously, not many changes 
could be made under those conditions. 

Recently, the ERIC Facility completed and tested software which will permit 
changes of this type to either Descriptors or . Ident i fi ers with a single trans- 
action which deletes a term and transfers its postings (if desired) to another term 
(or terms). With this added capability, the ERIC network' is now in a position to 
make all of the changes required to develop its vocabulary into an optimal tool for 
indexing and retrieval. This., however, is not a task which can be performed in a 
vacuum by an individual or even by a single group. Above all, the vocabulary must 
be responsive to the needs of the system it serves, and this means primarily the 
people of all components, mos t' ass ured I y including the users of the system outputs. 



Tfiree 



B . Vocabulary Improvement Program 

The ERIC Vocoibulary Improvement Program must be an Integrated operation. A 
particular emphasis is given tc system-wide participation, and vocabulary, change 
recommendations of any kind are solicited from all components and users of the 
s ys t em , These include recommended changes in vocabulary conventions, vocabulary 
structure, and the basic terminology. A mu 1 t I -f ace ted approach has been chosen 

to implement the prograiT. There are three major facets, and these can and niust--- 

be implemented in somewhat different fashions. 

o Descriptor C ross -Reference Changes . These. are changes in the BT. 
NT, and RT references in the Thesaurus itself and do not affect 
directly the question of which documents are indexed by which lerni(s). 
Consequently, these changes have little If any i mpact on the exist! ng 
• data base. Further, evaluation of cross-reference changes requj res 
that they be viewed in ihe context of the surrounding "tcrnnnology 

terrain" which requires display of at the very least d signifi- / 

cant por t ion^ of the Thesaurus , if not its entirety. Full-scale U 
coord i na t ion' of cross-reference changes among ERIC users is not 
anticipated ^s such activity would prove burdensome in terms of 
dissemination costs and evaluation t/me. 

o Descr Iptor Changes . Th»*se are changes to the Thesau rus indexing, as 
it exists in the data bdse, where a given Descriptor is removed from 
the file and its postinf)S are transferred to one or more existing 
Descriptors or to a new Descriptor added to the file for this purpose. 
Since these changes have an immediate and . $ ign i f icant impact on users 
of the file, as well as on day-to-day operations of the ERIC network, 
the widest practical coord i na t ion base is desired. 

o I dent i f i er Changes . A program to detect and correct Identifier varia- 
tions has been implemented. The data base is being corrected via 
transfer-and- elete operations. Since the Identifiers are by design 
unstructured and uncontrolled, full user coordination at the level 
required for Descriptor changes is not anticipated. 

The second and third facets encompass actual changes to the indexing terminology 
and are the subject of Sect Ion M. which follows. 



ERIC 



i 



Four 



II. . TERM CHANGE PROKDURES 

Differences betvveen Descriptors (Thesaurus terms) and Identifiers dictate 
somewhat different procedures for the implementation of changes. Descriptor changes, 
which are more closely controlled, are discussed first. 

A . Descr iptor (Thesaurus) Chan_ges 

Changes in Descriptors or Thesaurus terms will be based largely upon usage <n the 
data base and upon the detection and correction of situations of postable synonyms 
appearing in the vocabulary. Also, obvious misspellings and \%«^rd"forfn corrections 
win be required in some instances. The following paragraphs discuss Descriptor 
changes based on usage data, Descriptor changes based on the elimination of synonyms, 
and the proposed procedures to be used in actually accomplishing changes. 



1. Descriptor Editing Based on Usage, 

Descriptors that are posted very heavily (over 1000 postings) should be 
examined for their utility. Some of these Descriptors m-3y^be quite valid 
(e.g., TEACHER EDUCATION) and very reflective of the emphases i 1 5 the data 
base. However, the heavy postings on some others may ind'Cate tnat they 
are too general to be useful in either ann^micement media or manipulative 
retrieval (e.g., EDUCATIONAL PROGRAMS). ft may be that Descriptor^ in this 
latter case should be eithv.*r: (l) names of Descr i ptor Cipoups ; (2.) "array" 
terms, each with a scope note cautioning aga ini.1 its use in indexing and 
retrieval and with cross-references to more sps^cific Desci sptors constituting 
the "tops" of appropriate gener.ic families; or (3) provtdeci with delimiting 
scope notes to avo i d amb i guous usags in the future. 

Descriptors used too Infrequently tend to "c I ut ter" , " unnecessar i I y impeding 
easy use of the data base, the indexes, and the Thesa urus . ' Descriptors used 
less than about five (5) times should be examined for possible removal from the 
active vocabulary, except for relatively recent / sdd i t ions to the vocabulary . 
"Old," low-usage descriptors should be either: (0 converted to nonpostable 
terms, with USE references inserted and index postings transferred to the 
refer red"to-Oescr i ptors ; ,(2) deleted from the T hesaurus , but with postings 
transferred to selected Descriptors; or very occasionally, (3) deleted entirely 
from the Thesauru s , with postings also deleted from the data base.- 

2. Elimination of Postable Synonyms. 

ff postable synonyms exist in the Thesaurus , some documents will be indexed 
by One such synonym and some by the other. Retrieval via one synonym will thus 
be incomplete. Such a condition iii> highly undesirable. Instances of postable 
synonyms must be detected, preferred versions selected, USE references to 
these preferred versions created, and data-base postings transferred from the 
nonpreferred term to the preferred Descriptor. 

3. Thesaurus Change Procedure. 

The flow chart of the Thesaurus Change Procedure Is shown in Figure 1. 
While It is anticipated that many recommendations for change wi 1 1 or i g i nate 
from the day-to-day work of the Facility Lexicographer (e.g., with term cross-., 
references), change recommendations are solicited from the entire ERIC network 
and a!! users of the system . All changes, whether from internal or external 




i 



-7J- 



THESAURUS CHANGE PROCEDURE 



Four (A) 



EfllC Network 



Cofnpiie 



r 



Du|.;,, 
D Mt.t.n.- 




Panti Review 




VOCABULA HY fl_6VlE W (iHiX^P 

Ce<U(rfi ERIC 
Fdrtlity 

Ceiitr»\ 

Sld'icli'iy Of (1pr«. 
CJJE C()ntf»rio' 



Levicugraphy 




ERIC 



FIGURE 1. 



•7f 



F i ve 



sources, wiJI be processed as shown in Figure I. 

a . Kinds of Changes 

There are a number of different kinds of changes which can be made: 

o S imp le Merge Used to eliminate synonyms and to post low- 

use terms to the next higher generic level. 

Examples: Transfer postings on HEREDITY to GENETICS 

Transfer postings on GIRLS CLUBS to YOUTH CLUBS 

o Word Change Used to correct misspellings and change word 

forms . 

Examples: Transfer postings on PARODOX to (new term) PARADOX 

Transfer postings on PUBLICIZE to (new term) 
PUBLICITY 

^ Mul t iple Merge Used to eliminate multiple synonyms or to 

post several low-use terms to next higher generic level. May 
include word change. 

Examples: fraiisfer postings on MARKS and GRADES (REPORT) to 
GRADES (SCHOLASTIC) 

Transfer postings on QUICHE and YUCATEC to MAYAN 
LANGUAGES 

Transfer postings on HETEROPHORIA and HETEROTROPlA 
to (new term) STRABISMUS 

o Term Sp I i t Used to post low-use terms to two (or more) more 

general terms (not necessarily broader terms of the term in 
guest ion), when transfer to the next higher generic level might 
result in significant information loss. The receiving terms 
can then be coordinated for searching to retain specificity. 

Examples: Transfer terms on FORESTRY OCCUPATIONS to FORESTRY 
and AGRICULTURAL OCCUPATIONS 

Transfer terms on OCEAN ENGINEERING to OCEANOLOGY 
and ENGINEERING 

o S impl e Del ete Used to remove terms which have been added to 

the Thesaurus erroneously, or which have proved to have no 
ut i 1 i ty . 

Examples: Delete postings fron SATELLITE LABORATORIES 
Delete postings from HORIZOffTAL TEXTS 
Delete postings from VERTICAL TEXTS 



75- 



six 



The transfer-and-delete programs automatically generate transactions 
to purge (delete) terms from the Thesaurus when their postings are trans- 
ferred. If a cross-reference is desired (e.g., HEREDITY Use GENETICS), 
this must be added separately. If a *:erm deleted from the Thesauru s 
should be used as an Identifier, it is necessary (at the present time) 
to generate/add a separate Identifier transaction for each document 
indexed by the term to retain the postings. 

b . Change Recommendations 

n) Information Required - In order for the change to be processed 
efficiently, the following information is required for each change 
proposed: 

o Statement of the Desired Change Simple, imperative sentences 

like those used in the examples above are preferred. 

o The Number of Postings Required for each term involved in 

the change, this information may be obtained from the publication 
ERIC Descriptor and Identifier Usage Report . The date (month/ 
year) of the postings count should be noted. For changes other 
than Synonyms and word form changes, the accession number of 
the last known document indexed by each term in also desired. 
The latest accession number wiH indicate the timeliness of the 
terminology in question. 

o Reason for Change e.g., eliminate synonyms, correct spelling, 

etc. 

^ Justification for the Change Unless the change is a correction 

of an obvious error, such as a misspelling ( PARODOX/PARADOX) , 
j ust i r icat ion for the change must be supplied. Authorities for 
definitions should be indicated. Generally, in the case of 
Synonyms, postings will be transferred to the term with the 
larger number of postings. If a given recomniendat ion is to 
reverse this practice, the reasons for doing so must be explicit 
to justify the added expense. The timeliness of terminology 
should be examined before recommending the irans^er of postings 
to a higher generic level, or a simple deletion. Low use is 

per se suf f ic ient reason for deletion; a cer^iain amount of 
time has to be allowed for a new term to build u;> postings. 

(2) Submittal of Recommendations " Change recomTiendat ions do not have 
to be in any particular format, so long as the required infor'^'£itlon is 
included. Recofnmendat ions should be addressed to: 

ERIC Processing and Reference Facility 
ATTN: Lexicographer 
4833 Rugby Avenue, Suite 3O3 
Bethesda, Maryland 2001U 



Seven 



c . Evaluation and Edit 

The Lexicographer will evaluate incoming Change Recommendations and 
separate them into three categories as follows: 

o Category i ^ Significant, having major impact on indexing 
and/or potentially controversial. 

o Category 2 - Obviously necessary or useful, having minor impact 
and/or not likely to be contested. 

o Category 3 ^ Obviously trivial, contrary to rules, or insuffi- 
cient Justification or support for the change* 

It is anticipated that virtually all of the Change Recommendations will 
fall into Category 1 and be processed through the coordination procedure 
descr ibed in the fol lowi ng paragraphs . However , a f ew Category 2 
changes can be expected, and Category 3 changes, while not anticipated^ 
are possible. The Lexicographer will record all Category 2 and 3 
change proposals in a list^form report which will be reviewed by the 
Thesaurus Advisory Panel (See paragraph g. below). 

d. Change Not ice 

The Lexicographer will prepare for each Category 1 Change Recommenda- 
tion a Thesaurus Term Change Notice (Figure 2, Form EFF-21) completing 
sections I through 3. The forms will then be duplicated and distributed 
in two (2) copies to each member of the ERIC Vocabulary Review Group. 

e . Vocabulary Review Group Responsibilities 

Each member of the Vocabulary Review Group designates a responsible 
individual (Vocabulary Coordinator) to review all Change Notices, 
coordinating internally as desired. The membership of this group has 
been chosen to achieve the broadest possible coordination base consis- 
tent with efficient operation. 

The Vocabulary Coordinator will review each Change Notice as received, 
complete the RECOMMENDED ACTION section, sign the form, and return one 
(I) copy to the Lexicographer at the ERIC Facility within two (2) weeks 
of receip : . This deadi ine is establ ished to avoid unwarranted delays 



ERIC 



A written invitation to join the Vocabulary Review Group (from C.W. Hoover, ChicT, 
ER!C) was distributed in late June 1973 to a total of approximately 60 organizations. 
A total of 35 organizations responded favorably to this invitation, indicating interest 
in the Vocabulary Improvement Program and designating individuals who would partici- 
pate. These 35 organizations make up the existing Vocabulary Review Group; their 
composition includes 16 ERIC Clearinghouses, 10 university libraries, and 9 agencies 
of state education departments. 



THESAURUS 



]cRIVl CHANGE 



NOTICE 



Seven (A) 



No. 



1. PROPOSED CHANGE 



2. fMPACT 



POSTINGS BEFORE CHANGE b. POSTINGS AFTER CHANGE 

Term ^JStings Term Postings 



REASON FOR CHANGE (include full lustjficaiion. dting authorities for definition , usage, dnd treatment) 



RECOMM ENDED ACTION 

□ CONCUR □ NO INTEREST 

I ] OBJECT (Stale reasons m full detail, includmg potential impact upon input or retrieval operations showing significant 
loss of information. Cite authorities as appropriate.) 



Signed: 

Vocabulary Coordinator =_ O ^ftitiMtion , 



RETURN PRIOR TO lo: EWIC ProoMtingand RefertncA Facility 

ATTN: Lexicographer 

'^ 4Ba^ R'-igby Avenue, Suite 303 

EPF ;i 18 /2) Betiiefda, Maryland 20014 

ER?C f^'G'JR'^ 2. 



Eight 



or excessive follow-up. FaiUrre to respond within the time limit 
established will be treated a*> an indication of concurrence or lack 
of interest in the change. 

f . Tabulation of Responses 

The Facility Lexicographer will tabulate responses to each Change Notice 
as they are received. After the cut-off date, the objections to each 
change will be counted. 

o Significant Objections If there are five ^5) or more objections, 

the Change Notice will be set aside for. review by the Thesaurus 
Advisory Panel at its next s ess ion . 

o None/Few Objections If there are fewer than five (5) objections, 

the Lexicographer will examine the objections received to deter- 
mine whether or not the change is likeily to have a critical 
impact on an objector's operations. If so, the Change Notice 
will be set aside for review by the Thesaurus Advisory Panel. 
If not, the change will be entered into the system. 

g . Thesaurus Advisory Panel Review 

(1) Schedul e - The Thesaurus Advisory Panel^will confer at least quarter- 
ly, usually during the 3d week of January, April, July, and October, 
Additional meetings may be scheduled, as necessary. This schedule 
is timed to permit the decisions of .the Panel to be incorporated 
into the file prior to release of cumulative indexes, the quarterly 
Thesaurus updates , the quarterly ER5CTAPE updates, and the annual 
issuance of the ERIC Descriptor and Identifier Usage Report . 

.(2) Agenda - Depending upon the material available for consideration, 
the Panel will take action in- the following areas: 

o Review, approve, disapprove, or modify changes in Group Codes 
and cr-oss-reference structure. 

o Examine the list-form report of Category 2 and Category 3 Change 
Recommendations; confirm or reverse decisions, or re-classify 
items to Category 1 for full coordination. 

o Consider all Change Notices to which five (5) or more abject ions 

have been received as well as those judged critical; examine 

pros and cons of each change in the light of total system needs, 
and determine disposition. 

o Discuss other vocabulary-related matters, including such future 
p 1 ans and oro grams as : rul es changes or clarifications, format' 
changes, and publications changes. 



ERIC 



The Jhesaurus Advisory Panel includes a Chairperson {Central ERIC), a lexicographer 
tilRIC Facility), an ERIC Clearinghouse representative, and 5"7 other members selected 
f ron publ ic and private agencies. As of this writing, the final composition of this, 
group has not been determined. 



Nine 



h • Implementation and .Dissemi na^t ion 

.immediately after each Panel meeting, approved changes will be implemented, 
using the transf er-and-delete software, so that changes will be incor- 
porated the next edition of the Thesaurus ^ R|E cumulative indexes, etc. 
1.1 addition, lists of all changes will be incorporated in the next 
issues of E RIC Management Notes and t nterchange . 

B . Identifier Changes 

1, Identifier Scattering. 

ERIC Identifiers are essentially uncontrolled, and a great many synonyms 

have crept into t^ie Identifier list over the year^ e.g., different forms and/ 

or abbreviations and/or syntactical variants of names of organizations. It is 
of ten- irnpract ical for the user, with this state of affairs, to insure that he 
has detected all variants of a particular Identifier. A program to detect and 
correct Identifier variations has been implemented. For each set of Identifier 
Synonyms, a preferred version is being selected. The data base will be corrected 
via transfer-and-del ete operations. Since Identifiers are by design neither 
structured nor controlled, full user coordination will not be required. 

2. Sources and Information Required. \ 

Identifier Change Recommendations (as with Descriptors) are sol ic i ted from 
all ERIC components and users . Change recommendations should generally conform 
to the pattern specified for Descriptors, except that justifications need not 
be asi complete. 

3 • Rev iew and Ed i t . 

Identifier Change Recommendations will be reviewed and edited by the Facility 
Lexicographer and automatically assigned to either Category 2 or 3- 

k. Coordination. 

A list-form report of the Lexicographer's decisions on Identifier Change 
Recommendations wIM be submitted to the Thesaurus Advisory Pane! along with 
the correspond i ng- Descri ptor ( Thesaurus ) Change Recormrisndat ions f or conf i rmat ion , 
reversal, or re-classification, Re-Classified Identifier Change Recommendations 
will be fully coordinated in the same manner as Thesaurus Change Notices. 

5. Implementation and D issemi nat i on . 

/ ' 

Implementation and dissemination will be accomplished in the same manner as 
Descriptor changes;, 

---7 _ • , 




-8^ 



THESAURUS 



TERM CHANGE 



NOTICE 



No. 



1. PROPOSED CHANGE 



Transfer postings on PERSONAL RELATIONSHIP to INTEf^PERSONAL 



RELATIONSHIP. Retain PERSONAL RELATIONSHIP as UF to INTERPERSONAL RELATIONSHIP, 



2 m^PACT 

d POSTINGS BEFORE. CHANGE (Dec '72, RIE) 



Term 



PERSONAL RELATIONSHIP (^f.ajor) 

(Miinor) 

INTERPERSONAL RELATIONSHIP (Major) 
" " : (Minor) 



Postings 

6 

M 
139 
275 



b. POSTINGS AFTER CHANGE 
Term 



' Postrngs 

INTERPERSONAL RELATIONSHIP (Major) 1^+5 
'» " (Minor) 292 



3. REASON FOR C HANGE OncJude lull lusitfiCriTion, cinnr) authortUes It r clefiriition>. ^tsayc, unci ireatmunl) 
Both PERSONAL RELATIONSHIP and INTERPERSONAL RELATIONSHIP ar^i very old descriptors, dating 
back to the Phase I ERIC Thesaurus (pre- 1968) . Originally, the two terms were not cross- 
referenced, indicating that one (the second to be entered) was added without knowledge of 

• the other; currently, INTERPERSONAL RELATIONSHIP is the broader term. PERSONAL RELATtONSHIP 
might conceivably be used to refer to a more basic .or intimate reliationship (especially 
between two people) than INTERPERSONAL RELATIONSHIF might imply. However, this distinction 



is unnecessary for an educational vocabulary. See "Interpersonal*' and. '.^Personal'* in 
English s English's Comprehensive .Oict ion ar y of Psychological & Psychoanalytical Terms . 



RECOMMENDED ACTION 



n CONCUR 



□ NO INTEREST 



OBJECT iSj.Mo i»msoi^s m full c1et<iil. includinq poienlial impact upon input or retno.val opeiolions showing iiqnificanl 
tc>s'» urin.ttjfmaiujn. Cite atithortnes as appropriate. I V"- 



j Vocabulary Coordinator . 



Organization 



«F;rURN-PRIOR TO . Oc fhk%^ /?7/^ 



Tc^: ! ERIC Processing and Reference Facility 
■ ATTNt iexicogcv^pher' 
4833 Rtigby Avv^nue/ Suite 303 
Bethesda.Marv5^;;1^: 20014 «■ . 



THESAURUS 

TERM CHANGE 

; NOTiCE 

No. 



1. PROPOSED CHANG E 

Change PLANNING (FACILITIEfO with cts UF^Facf 1 1 ties Planning, to FACILITY PLANNING. 



2. IMPACT 



POSTINGS BEFORE CHANGE b. POSTINGS AFTER CHANGE 

Term Postings Term Postings 



Not Appi icable 



3. REASON I QR CHA NGE llncludo full just4frcation ctung duthormes for (iefinrlrons. iisaqo, dnd troaimuuti 

The "-^rm PLANHfNG {FACILITIES) does not conform to Item 1.1.3. J of the ERIC Rules for 
T hesay ^ us. Preparati on. This rule states: "A parenthetical qualifier ident if ie any 
part ict 1 a r i ndexaS) fe^mean i ng of a homograph. One of the reasons for. restricting the 
use of parenthet ical qual if iers to homographs is to preclude the use of inverted 
entries." The proposed- term FAC I LITY PLANN ING is In accord with this rule and is 
consistent with the rest of the '^facility" terms in the Thesaurus. 



RECOMMENDED ACTION 

□ concur □ no interest 

C3JECT iSi.MrH.-cisofi-* in full dotdil, u'lclutiing pottintial impact upon input or rotri. val operations showing -jiqiiificant 
If)-', of mformanon. Cito authorities as appropriate.) 



Signed 



Vocabulary Coordinator. 



Organization . 



RETURN PI^IOR TO 



ERIC 



To: ERIC Processing and Rteference Facility 
ATTN: . Lexicographer 
4833 Rugby Avenue, Suite 303 
Bethesda. Maryland 20014 

.84r 



THESAURUS 



TERM CHANGE 



NOTICE 



No. 



1. PROPOSED CHANGE 

Delete '•Morals" as UF to ETHICS. Add MORALS as descriptor. Transfer postings on MORAL 
VALUES and ETHICAL VALUES to new term MORALS. Retain "Mora' Values" and "Ethical Values" 
as UF's to MORALS. 



2. IMPACT 



POSTINGS BEFORE CHANGE (Dec '72, RIE) 



Term 



ETHICS 
MORALS 

ETHICAL VALUES 
MORAL VALUES 



Postings 

0 
95 
130 



POSTINGS AFTER - NGE 
Term 



ETHICS 
MORALS 



Postings 
225 



3. REASON FOR CHANGE llncUjde fuU justificaliott camg authorities for definition usage, dnd treatment) 

ETHICAL VALUES; and MORAL VALUES are old Phase I Thesaurus terms that were nevcjr 
cross«referenced. They were probably entered as a result of free indexing, and without 
the benefit of lexicographic analysis. ETHICS— UF "Monals*' was entered much later and 
structured using the LEX Thesv^jrus. The ambiguity and inconsistency among these terms 
could be eliminated with the above change and the addition of the following Scope Ciotes; 

ETH ICS. . ; .SCudy of the ideal In human character and conduct. 

MORALS, ... Individual/group standards of conduct in terms of right or wrong, or 
actual conduct with reference to such standards. 



See ETHICS and MORALS In English & English's Comprehensive Dictionary of Psychological S 
Psychoanalytical Terms . ^ 



L... 



RECOMMENDED ACTION 



□ CONCUR 



□ NO INTEREST 



OBJECT. (Si. >ti' ifNJSoi^s m full detail, inciuding potential innpact upnn input or retrieval operations showmq significant 
loss of information. Cite authorities as appropriate.) 



Signed: 



Vocabulary Coordinator. 



Orgai ^ization . 



RETURN PRIOR TO QC7^^#^ /iSL^ /^73 To: ERIC Processing and Reference Facility 

ATTN: Lexicographer 
4833 Rugby Avenue, Suite 303 
O 7;i ' Bethesda, Maryland 20014 < 



THESAURUS 



TERM CHANGE 



NOTICE 



No. 



1. PROPOSED CHANGE 

Transfer postings on TEACHER EXPERIENCE to TEACHING EXPERJENCE. Retain TEACHER EXPERIENCE 
as UF to TEACHING EXPERIENCE, but drop the current UF "Professional Laboratory Experience.' 



IMPACT 

,. POSTINGS BEFORE CHANGE (Dec '72, Rl 



TEACHER 

I I 

TEACHING 



Term 

EXPERIENCE 

I I 



EXPERIENCE 

1 1 



(Major) 
(Mi nor) 
(Major) 
(Minor) 



E) 

Postings 

36 
113 
23 
39 



POSTINGS AFTER CHANGE 
I erm 

EXPERIENCE 



TEACHING 



(Major) 
(Mi nor) 



Postings 

152 



.•i HLASOM f-Oh CHANGE :lfU hKl»- fnll lustMiCiit'Dn t jmg jurhonlies for tiefinirior^S, usdq^'. ui ;1 trrjfrTirf 15 

TEACHING EXPERIENCE was added to the Thesaurus In late 1969- It was believed that the 
existJng descriptor TEACHER EXPERIENCE with its UF "Professional Laboratory Experience" 
was insufficient to express the Idea of both preservice and inservice professional 
teaching experience taking place either in or out of a laboratory. This was a fallacy 
in that the UF should not have been construed as a delimiter. Thus, the new term was 
added in error. Some ambiguiLy will be eliminated by merging the postings and retaining 
the Scope Note for TEACHING EXPERIENCE~"Actual and simulated experience of preservice 
ond inservice teachers" (Good*s Dictionary of Education ). Further ambiguity will be 
eliminated by adding a Scope Note to the descriptor TEACHER BACKGROUND. This Scope Note 
would simply state "Experience other than teaching." 



RLCOMIVItNDfcrj ACTION 



CONCUR 



□ NO INTEREST 



i OBJtCT .S' i\r ff'jsor^s in full detdil. including potential impect upon mput or njlnevc' operattons showmq fjiqnificar^ 

I If of nforma! on Ci to authorities as d[»prOpna(e. I 



Voctjbol.^ry Coordinator. 



Organization 



'I TURN PRIOR T(r 



ERIC 



To: ERIC Processing and Reference Facility 
ATTN: Lexicographer 
4833 Rugby Avenue. Suite 303 
Bethesda. Maryland 20014 



.8f 



r 



THESAU:<US 



TERM CHANGE 



NOTICE 



No. 



1 PROPOSED CHANGE 

Transfer postings on HETEROPHORIA and HETEROTROPIA to new term STRABISMUS. Retain 
HETEROPHORIA and HETEROTROPIA as UF's to STRABISMUS. 



IMPACT 

.i POSTINGS BR FORE CHANGE (Dec * 72 , RIE) 

Tt»rm Postings 

HETEROPHORIA 2 
HETEROTROPIA i 



b POSTINGS AFTER CHANGE 
Term 



STRABISMUS 



Postings 

3 



Mt ASON F(;)R CHANGE dnt ludi tmi lU'.ritK .il.of . ! uj .Jl-lh<wMu•^ inr flota.itu.i . is.Hjr, j'Ml f f<'.iiin.-i.r ! 

Two very specific terms (entered 2/68) with very lew postings will be merged into a new, 
broader term. Both HETEROPHORIA and HETEROTROPIA refer to tendencies of the eyes to turn 
Bfway from the position correct for binocular vision, but HETEROPHORIA is a "latent" 
imbalance or deviation in contrast to HETEROTROPIA or a '"manifest" imbalance. STRABISMUS 
or "squint" is ^sually associated with HETERvlTROPI A. It can, however, take a broader 
meaninq (sec "squint," Stedman's Med ica 1 D ici ionary ) . As a new term, STRABISMUS will be 
sctJt^ed as t I I ows : 

STRABISMUS. ... Lack of coordination of eye muscles so fhjt the two eyes do not 
focus on the same point. 

In addition, the following current UF's to HETEROPHORIA and HETEROTROPIA will be dropped 
from the Thesaurus as they are not likely to i^opear in <iducational literature without 
reference to a more generic concept: "Cyclophoi ia," "Esophoria," "Esotropia," "Exvophoria, 

'Exotropia, 'Hyperphoria," "Hyper t rop i a , " "Hypophoria," and "Hypot rop i a . " The UF's 

'Cross £yv.> and "Walleyes" will be retained. 



RhCOMMLNDEO ACTION 



n CONCUR 



□ NO INTEREST 



OBJt CT j!» 'r,r,>)(,, ;m full dotdil, ifick)din(j (>otential impact upof) mput or retrif.dl operations showmf) significant 
.,.•,..)< 'rift)im,uion Ci tf outhontics ds appr op^ ate ) 



Vocahiilary Coordinator . 



Organization . 



ERIC 



FU.TURN '^RIOR TO CtcjA^^ lA^ /^73 



To: ERIC Processing and Reference Facility 
ATTN: Lexicographer 
4833 Rugby Avenue. Suite 303 
Bethesda. Marytar d 20014 



THESAURUS 

TERM CHANGE 

NOTICE 



No ^ 



1 PROPOSED C HANGE 

Delete the descriptors HORIZONTAL TEXTS and VERTICAL TEXTS. 



; 2 IMPACT 

! .1 fOSHNGS BEFORE CKANGE (Oec '72, RIE) b POSTINGS AFTER CHANGE 

I Term Postings Term Postings 

HORIZONTAL TEXTS 1 

VERTICAL TEXTS I 



3 RfiASON FOR CHANGE 'liKludr full )ustitu:dtion . .tmg authorities for definitions, -isdqe. and trejirnrnti 

These ancient terms were established as descriptors in I966. They refer to formats of 
progranied texts. Each has been used only one time, and for the same document. They will 
be replaced by the more generic descriptor PROGRAMED TEXTS for that one document. They 
wMl not be reiained as UF's. 



F^ECOM VI ENDED ACTION 

□ CONCUR J NO INTEREST 

OBJtCT 'St.ii'' ri-asjjns n full detail, ir)clud<nq potential impact upon nput or retnovd) operations shovying significant 
.1 of mfuff lation. Oxc authorities as apiKopriate.) 



Vcicabulary Coordinator Organization . 



ni ruRN PRIOR TO OCrf'aA^ ZAy /9?J To: ERIC Processing and Reference Facility 

ATTN: LexicographiK 

4833 Rugby Avenue, Suite 303 

' ii .1 Bethesda. Maryland 20014 

ERIC 



